MDSec’s LLM penetration testing employs a range of social engineering, architectural and technical attacks to find vulnerabilities
The OWASP Top 10 LLM attacks highlight some of the main concerns facing organisations – this is not a comprehensive list but as it is likely to grow in prominence, it is a useful basis for understanding:
LLM01: Prompt Injection
Prompts are the main input to an LLM and can be used to cause specific output or modify behaviour, particularly in systems that employ Reinforcement Learning through Human Feedback (RLHF). Wherever the LLM parses a document, API response, web page or other text, there is a potential that malicious prompts embedded into the resource cause ‘harms’ or adverse behaviour.
For RLHF systems, direct prompts may be sufficient to influence the behaviour of the system in an adversarial manner.
LLM02: Insecure Output Handling
This vulnerability occurs when an LLM output is trusted by a connected system. LLM output could be engineered to contain malicious payloads in order to exploit classic vulnerabilities such as SQL Injection, command injection etc. This is particularly significant given the wide range of possible LLM output and the common misconception that backend systems are not “externally facing” and hence do not need as much secure engineering.
LLM03: Training Data Poisoning
If companies are training models, or relying on pre-trained models which may have used untrusted data accessible to an attacker, the LLM may become maladapted to train in a way that benefits the attacker. This is of critical importance if organisations expect to trust the output and decision making of the model.
LLM04: Model Denial of Service
Where models are critical in providing business operations (for example customer-facing chatbots), denial of service may be a consideration. As LLMs are both extremely resource-heavy and subject to a potentially unbounded range of inputs, there is a risk of an attacker causing denial of service through a recursive prompt, or repeated submission of resource-intensive prompts.
LLM05: Supply Chain Vulnerabilities
An LLM is only as secure as its components, which may include its training data, plugins, and its footprint on the network and server. Compromise of one of these components may allow compromise of either a portion or the whole LLM.
LLM06: Sensitive Information Disclosure
In order to be of benefit to an organisation, it is inevitable that an LLM will have access to company information. This may include generally available internal information such as corporate policy documents and knowledge bases, but with many organisations it will quickly be expanded to include meeting notes, emails, source code and more. Numerous non-technical attacks employing basic social engineering have been used against public models to either ‘jailbreak’ a model or cause it to expose sensitive information.
LLM07: Insecure Plugin Design
LLM integration inevitably involves plugins and APIs to access internal company information or access internal servers and resources. This makes LLM APIs one of the most important attack surfaces, as the core LLM may be relatively robust overall, but the connectors may be newly developed and have received limited to no testing. In addition, any APIs may be written on the assumption that they are “internal” and hence may be more insecure, even though the APIs are indirectly usable through prompting.
Attacks against connectors include attempts to use the connector to access normally inaccessible or prohibited resources in an attack setup similar to Server Side Request Forgery, attacks against the connector code and logic, or attempts to cause the connector to divulge sensitive information to the prompter.
LLM08: Excessive Agency
In the context of LLM08, excessive agency refers to situations where the LLM has overly broad access either to information resources, or even to perform actions within the organisation. This may be abused by a prompter in a standard Privilege Escalation attack to perform sensitive actions normally prohibited to the prompter.
LLM09: Overreliance
One of the early observations of LLMs relates to their users’ misunderstanding of how they operate and hence how to trust their output. The common example of this is the emergence of “hallucinations” where the LLM may provide apparently legitimate output (including URLs, citations, code package locations or other facts). In other situations, the LLM may simply contain biases based on either incomplete or tainted training data or human feedback, resulting in misinformation. This is significant where LLM output is being incorporated in legal contexts or in factual statements from the organisation.
LLM10: Model Theft
Where organisations train their own models, there is likely to be not only proprietary information but also increasingly key rules on decision making within the organisation. Adversaries may be able to extract this information and business logic and use it to competitive advantage.