Large Language Models

Expert assessment from the company behind The Web Application Hacker’s Handbook

MDSec has worked at the forefront
of application security since its inception.

LLMs are expected to be an increasingly embedded portion of many organisations’ business over the next few years. Integrations will have access to internal code bases in order to automate software engineering tasks, access internal documentation in order to provide a natural language frontend to company knowledgebases, provide tailored coaching and training, access meeting notes to provide summaries and transcripts, and interface with customers as ChatBots. This places the LLM at the heart of the organisation, with access to a large library of sensitive information and ever-increasingly making explicit or implicit decisions within the organisation.

Given the high-profile attention on implementing Generative AI and the desire to de-risk their introduction, an LLM penetration test is a key step in highlighting these risks in a practical manner.

Ready to engage with MDSec?

Speak to one of our industry experts and find out how MDSec can help your business.

+44 (0) 1625 263 503
contact@mdsec.co.uk

To meet the challenges of Generative AI and LLM integration head-on, MDSec specialise in researching new technology (over 100 examples listed in https://www.mdsec.co.uk/knowledge-centre/insights/), formulating and authoring methodologies (as seen within the Web and Mobile Application Hacker’s Handbook) and creating new tooling and training (as seen within our Nighthawk product and Adversary Simulation services).

 

MDSec’s LLM penetration testing employs a range of social engineering, architectural and technical attacks to find vulnerabilities

The OWASP Top 10 LLM attacks highlight some of the main concerns facing organisations – this is not a comprehensive list but as it is likely to grow in prominence, it is a useful basis for understanding:

LLM01: Prompt Injection

Prompts are the main input to an LLM and can be used to cause specific output or modify behaviour, particularly in systems that employ Reinforcement Learning through Human Feedback (RLHF). Wherever the LLM parses a document, API response, web page or other text, there is a potential that malicious prompts embedded into the resource cause ‘harms’ or adverse behaviour.

For RLHF systems, direct prompts may be sufficient to influence the behaviour of the system in an adversarial manner.

LLM02: Insecure Output Handling

This vulnerability occurs when an LLM output is trusted by a connected system. LLM output could be engineered to contain malicious payloads in order to exploit classic vulnerabilities such as SQL Injection, command injection etc. This is particularly significant given the wide range of possible LLM output and the common misconception that backend systems are not “externally facing” and hence do not need as much secure engineering.

LLM03: Training Data Poisoning

If companies are training models, or relying on pre-trained models which may have used untrusted data accessible to an attacker, the LLM may become maladapted to train in a way that benefits the attacker. This is of critical importance if organisations expect to trust the output and decision making of the model.

LLM04: Model Denial of Service

Where models are critical in providing business operations (for example customer-facing chatbots), denial of service may be a consideration. As LLMs are both extremely resource-heavy and subject to a potentially unbounded range of inputs, there is a risk of an attacker causing denial of service through a recursive prompt, or repeated submission of resource-intensive prompts.

LLM05: Supply Chain Vulnerabilities

An LLM is only as secure as its components, which may include its training data, plugins, and its footprint on the network and server. Compromise of one of these components may allow compromise of either a portion or the whole LLM.

LLM06: Sensitive Information Disclosure

In order to be of benefit to an organisation, it is inevitable that an LLM will have access to company information. This may include generally available internal information such as corporate policy documents and knowledge bases, but with many organisations it will quickly be expanded to include meeting notes, emails, source code and more. Numerous non-technical attacks employing basic social engineering have been used against public models to either ‘jailbreak’ a model or cause it to expose sensitive information.

LLM07: Insecure Plugin Design

LLM integration inevitably involves plugins and APIs to access internal company information or access internal servers and resources. This makes LLM APIs one of the most important attack surfaces, as the core LLM may be relatively robust overall, but the connectors may be newly developed and have received limited to no testing. In addition, any APIs may be written on the assumption that they are “internal” and hence may be more insecure, even though the APIs are indirectly usable through prompting.

Attacks against connectors include attempts to use the connector to access normally inaccessible or prohibited resources in an attack setup similar to Server Side Request Forgery, attacks against the connector code and logic, or attempts to cause the connector to divulge sensitive information to the prompter.

LLM08: Excessive Agency

In the context of LLM08, excessive agency refers to situations where the LLM has overly broad access either to information resources, or even to perform actions within the organisation. This may be abused by a prompter in a standard Privilege Escalation attack to perform sensitive actions normally prohibited to the prompter.

LLM09: Overreliance

One of the early observations of LLMs relates to their users’ misunderstanding of how they operate and hence how to trust their output. The common example of this is the emergence of “hallucinations” where the LLM may provide apparently legitimate output (including URLs, citations, code package locations or other facts). In other situations, the LLM may simply contain biases based on either incomplete or tainted training data or human feedback, resulting in misinformation. This is significant where LLM output is being incorporated in legal contexts or in factual statements from the organisation.

LLM10: Model Theft

Where organisations train their own models, there is likely to be not only proprietary information but also increasingly key rules on decision making within the organisation. Adversaries may be able to extract this information and business logic and use it to competitive advantage.

Get in touch

Contact Us
Copyright 2025 MDSec