Foundations Of AI Security 4
OWASP Top 10 for LLM Applications
Large Language Models (LLMs) are AI algorithms that can perform a variety of natural language processing (NLP) tasks. They can process user inputs and create plausible responses by predicting sequences of words. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process.
1) Prompt Injection :
Attack :
- Direct - An attacker directly inputs harmful prompts into a large language model (LLM) to either gain access to the system or misuse the LLM, such as creating harmful content.
- Indirect - An attacker indirectly inputs harmful prompts into a large language model (LLM) by using separate data sources like text or multimedia from databases or websites. These prompts can be hidden or disguised, allowing the attacker to gain access to the system or target an unsuspecting user.
An example of direct prompt injection could be an attacker crafting a prompt that causes a chatbot to divulge confidential information, while an example of indirect prompt injection might involve embedding a malicious prompt in a website's content that the LLM processes, leading to unintended consequences.
Mitigation :
- Enforce Privilege Control
- Segregate External Content
- Establish Trust Boundaries
2) Insecure Output Handling:
Insecure Output Handling in the context of LLMs refers to the inadequate validation, sanitization, and management of the outputs generated by LLMs before they are sent to other components and systems.
Attack :
Insecure Output Handling specifically addresses the handling of LLM outputs before they are passed downstream.If exploited successfully, an attacker can get anything from XSS to RCE.
Mitigation :
- Apply proper input validation on the responses generated by the LLM before they are passed on to backend functions or other system components.
- Implement robust input validation and sanitization measures by following the guidelines set forth by the OWASP Application Security Verification Standard (ASVS).
- Apply proper output encoding techniques
3) Training Data Poisoning :
Attack :
This attack involves adversaries compromising the integrity of the training data to embed vulnerabilities or biases in the machine learning models trained on this data
Mitigation :
- Verify the Training Data Supply Chain : The “Machine Learning Bill of Materials” (ML-BOM) methodology can be employed.
- Legitimacy Checks: Ensure that the data sources used in pre-training, fine-tuning, and embedding stages are legitimate and contain accurate data.
- Use-Case Specific Models : Tailor different models for distinct use-cases by using separate training data or fine-tuning processes.
- Vetting and Filtering Training Data: Apply strict vetting or input filters for training data to control the volume of potentially falsified data.
- Human Review and Auditing
- Benchmarking and Reinforcement Learning
4) Model Denial of Service (DOS):
This attack maps to the ATLAS framework under the “Denial of ML Service” technique within the “Impact” tactic.
Attack :
- Posing Queries for Recurring Resource Usage: An attacker might use tools like LangChain or AutoGPT to generate a high volume of tasks that are queued for processing by the LLM. This can lead to a situation where the LLM is constantly busy handling these tasks, consuming resources and potentially delaying the processing of legitimate queries.
- Sending Resource-Intensive Queries: Queries that use unusual orthography or sequences might be more computationally demanding for the LLM to process
- Continuous Input Overflow: In this scenario, an attacker sends a continuous stream of input that exceeds the LLM’s context window
- Repetitive Long Inputs
- Recursive Context Expansion
- Variable-Length Input Flood
Mitigation :
- Input Validation and Sanitization
- Resource Usage Caps: Implementing caps on resource use per request or step can help manage the load on the system.
- API Rate Limits
- Limiting Queued and Total Actions: Limiting the number of actions that can be queued and the total number of actions in a system that reacts to LLM responses.
- Setting Input Limits
- Developer Awareness: Raising awareness among developers about potential DoS vulnerabilities in LLMs.
5) Supply Chain Vulnerabilities :
Attack :
It maps to the “Poison Training Data” and “Backdoor ML Model” techniques. Attackers can exploit supply chain vulnerabilities to introduce poisoned data into the training process or embed backdoors in pre-trained models, compromising the security and integrity of the LLM.
Mitigation :
- Vet Data Sources and Suppliers: Only use trusted suppliers for data and models. Review their terms and conditions, privacy policies, and security measures.
- Use Reputable Plugins
- Apply OWASP Top Ten Mitigations
- Maintain an Up-to-Date Inventory
- Use Model and Code Signing
- Implement Anomaly Detection and Adversarial Robustness Tests
- Monitor for Vulnerabilities
6) Sensitive Information Disclosure :
Attack :
This vulnerability maps to the “Data Exfiltration via ML Inference API” technique, where sensitive information may be unintentionally revealed through the LLM’s outputs. This can lead to unauthorized access to sensitive data, breaches of intellectual property, privacy violations, and other security concerns.
Mitigation :
- Data Sanitization and Scrubbing
- Input Validation and Sanitization
- Fine-tuning Data Handling
- Limiting Access to External Data Sources
7) Insecure Plugin Design :
LLM plugins are extensions that are automatically activated by the model during user interactions.
Attack :
Attackers can exploit the lack of input validation and inadequate access controls in LLM plugins to gain unauthorized access or execute malicious code on the target system.
It pertains to the creation and management of LLM plugins, as opposed to the use of third-party plugins, which falls under the broader category of LLM Supply Chain Vulnerabilities.
Mitigation :
- Parameterized Input :This means that inputs should be clearly defined in terms of type and range.
- Input Validation
- Thorough Inspection and Testing
- Minimizing Impact: Design plugins to minimize the impact of any exploitation of insecure input parameters
- Authentication and Authorization
- User Authorization: For sensitive plugins, require manual user authorization and confirmation for any action taken
- API Security
8) Excessive Agency :
Excessive Agency in LLMs refers to the vulnerability that arises when an LLM system is given too much autonomy or authority to interact with other systems and make decisions based on its inputs and outputs.
Attack :
An LLM with excessive permissions might execute unauthorized commands on a server, leading to data breaches or system downtime. Similarly, an LLM with too much autonomy might make incorrect decisions that affect the integrity of processed data or the availability of services.
Mitigation :
- Limit Plugin Functions
- Minimize Plugin Functionality
- Avoid Open-Ended Functions
- Restrict Permissions
- Maintain User Authorization
- Implement Human-in-the-Loop Control
- Enforcing Authorization in Downstream Systems
- Log and Monitor Activity
- Implement Rate-Limiting
9 ) Overreliance :
Overreliance on LLMs can lead to significant risks when these models produce misleading or incorrect information, which is then accepted as accurate without proper scrutiny.
Attack :
- Security Breach
- Misinformation
- Miscommunication
- Legal Issues
- Reputational Damage
Mitigation :
- Regular Monitoring and Review
- Cross-Check with Trusted Sources
- Model Enhancement
- Prompt Engineering
- Parameter-Efficient Tuning (PET)
- Chain-of-Thought Prompting
- Automatic Validation Mechanisms
- Task Decomposition
- Risk Communication
- Responsible API and Interface Design
- Secure Coding Practices
10) Model Theft :
It refers to the unauthorized access, copying, or extraction of proprietary LLMs by malicious actors or advanced persistent threats (APTs)
Attack :
It involves adversaries using methods to obtain a functional copy of a private machine learning model, often by repeatedly querying the model’s inference API to collect its outputs and using them to train a separate model that mimics the behavior of the target model.
Mitigation :
- Strong Access Controls
- Robust Authentication Mechanisms
- Supplier Management: Carefully track and verify suppliers of LLM components to prevent supply-chain attacks.
- Centralized ML Model Inventory or Registry
- Network Resource Restrictions
- Automated MLOps Deployment
- Regular Monitoring and Auditing
- Mitigation of Prompt Injection Techniques
- Adversarial Robustness Training
- Watermarking Framework