Foundations Of AI Security 4

Posted May 30, 2024

By Amir 6 min read

OWASP Top 10 for LLM Applications

Large Language Models (LLMs) are AI algorithms that can perform a variety of natural language processing (NLP) tasks. They can process user inputs and create plausible responses by predicting sequences of words. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process.

1) Prompt Injection :

Attack :

Direct - An attacker directly inputs harmful prompts into a large language model (LLM) to either gain access to the system or misuse the LLM, such as creating harmful content.
Indirect - An attacker indirectly inputs harmful prompts into a large language model (LLM) by using separate data sources like text or multimedia from databases or websites. These prompts can be hidden or disguised, allowing the attacker to gain access to the system or target an unsuspecting user.

An example of direct prompt injection could be an attacker crafting a prompt that causes a chatbot to divulge confidential information, while an example of indirect prompt injection might involve embedding a malicious prompt in a website's content that the LLM processes, leading to unintended consequences.

Mitigation :

Enforce Privilege Control
Segregate External Content
Establish Trust Boundaries

2) Insecure Output Handling:

Insecure Output Handling in the context of LLMs refers to the inadequate validation, sanitization, and management of the outputs generated by LLMs before they are sent to other components and systems.

Attack :

Insecure Output Handling specifically addresses the handling of LLM outputs before they are passed downstream.If exploited successfully, an attacker can get anything from XSS to RCE.

Mitigation :

Apply proper input validation on the responses generated by the LLM before they are passed on to backend functions or other system components.
Implement robust input validation and sanitization measures by following the guidelines set forth by the OWASP Application Security Verification Standard (ASVS).
Apply proper output encoding techniques

3) Training Data Poisoning :

Attack :

This attack involves adversaries compromising the integrity of the training data to embed vulnerabilities or biases in the machine learning models trained on this data

Mitigation :

Verify the Training Data Supply Chain : The “Machine Learning Bill of Materials” (ML-BOM) methodology can be employed.
Legitimacy Checks: Ensure that the data sources used in pre-training, fine-tuning, and embedding stages are legitimate and contain accurate data.
Use-Case Specific Models : Tailor different models for distinct use-cases by using separate training data or fine-tuning processes.
Vetting and Filtering Training Data: Apply strict vetting or input filters for training data to control the volume of potentially falsified data.
Human Review and Auditing
Benchmarking and Reinforcement Learning

4) Model Denial of Service (DOS):

This attack maps to the ATLAS framework under the “Denial of ML Service” technique within the “Impact” tactic.

Attack :

Posing Queries for Recurring Resource Usage: An attacker might use tools like LangChain or AutoGPT to generate a high volume of tasks that are queued for processing by the LLM. This can lead to a situation where the LLM is constantly busy handling these tasks, consuming resources and potentially delaying the processing of legitimate queries.
Sending Resource-Intensive Queries: Queries that use unusual orthography or sequences might be more computationally demanding for the LLM to process
Continuous Input Overflow: In this scenario, an attacker sends a continuous stream of input that exceeds the LLM’s context window
Repetitive Long Inputs
Recursive Context Expansion
Variable-Length Input Flood

Mitigation :

Input Validation and Sanitization
Resource Usage Caps: Implementing caps on resource use per request or step can help manage the load on the system.
API Rate Limits
Limiting Queued and Total Actions: Limiting the number of actions that can be queued and the total number of actions in a system that reacts to LLM responses.
Setting Input Limits
Developer Awareness: Raising awareness among developers about potential DoS vulnerabilities in LLMs.

5) Supply Chain Vulnerabilities :

Attack :

It maps to the “Poison Training Data” and “Backdoor ML Model” techniques. Attackers can exploit supply chain vulnerabilities to introduce poisoned data into the training process or embed backdoors in pre-trained models, compromising the security and integrity of the LLM.

Mitigation :

Vet Data Sources and Suppliers: Only use trusted suppliers for data and models. Review their terms and conditions, privacy policies, and security measures.
Use Reputable Plugins
Apply OWASP Top Ten Mitigations
Maintain an Up-to-Date Inventory
Use Model and Code Signing
Implement Anomaly Detection and Adversarial Robustness Tests
Monitor for Vulnerabilities

6) Sensitive Information Disclosure :

Attack :

This vulnerability maps to the “Data Exfiltration via ML Inference API” technique, where sensitive information may be unintentionally revealed through the LLM’s outputs. This can lead to unauthorized access to sensitive data, breaches of intellectual property, privacy violations, and other security concerns.

Mitigation :

Data Sanitization and Scrubbing
Input Validation and Sanitization
Fine-tuning Data Handling
Limiting Access to External Data Sources

7) Insecure Plugin Design :

LLM plugins are extensions that are automatically activated by the model during user interactions.

Attack :

Attackers can exploit the lack of input validation and inadequate access controls in LLM plugins to gain unauthorized access or execute malicious code on the target system.

It pertains to the creation and management of LLM plugins, as opposed to the use of third-party plugins, which falls under the broader category of LLM Supply Chain Vulnerabilities.

Mitigation :

Parameterized Input :This means that inputs should be clearly defined in terms of type and range.
Input Validation
Thorough Inspection and Testing
Minimizing Impact: Design plugins to minimize the impact of any exploitation of insecure input parameters
Authentication and Authorization
User Authorization: For sensitive plugins, require manual user authorization and confirmation for any action taken
API Security

8) Excessive Agency :

Excessive Agency in LLMs refers to the vulnerability that arises when an LLM system is given too much autonomy or authority to interact with other systems and make decisions based on its inputs and outputs.

Attack :

An LLM with excessive permissions might execute unauthorized commands on a server, leading to data breaches or system downtime. Similarly, an LLM with too much autonomy might make incorrect decisions that affect the integrity of processed data or the availability of services.

Mitigation :

Limit Plugin Functions
Minimize Plugin Functionality
Avoid Open-Ended Functions
Restrict Permissions
Maintain User Authorization
Implement Human-in-the-Loop Control
Enforcing Authorization in Downstream Systems
Log and Monitor Activity
Implement Rate-Limiting

9 ) Overreliance :

Overreliance on LLMs can lead to significant risks when these models produce misleading or incorrect information, which is then accepted as accurate without proper scrutiny.

Attack :

Security Breach
Misinformation
Miscommunication
Legal Issues
Reputational Damage

Mitigation :

Regular Monitoring and Review
Cross-Check with Trusted Sources
Model Enhancement
- Prompt Engineering
- Parameter-Efficient Tuning (PET)
- Chain-of-Thought Prompting
Automatic Validation Mechanisms
Task Decomposition
Risk Communication
Responsible API and Interface Design
Secure Coding Practices

10) Model Theft :

It refers to the unauthorized access, copying, or extraction of proprietary LLMs by malicious actors or advanced persistent threats (APTs)

Attack :

It involves adversaries using methods to obtain a functional copy of a private machine learning model, often by repeatedly querying the model’s inference API to collect its outputs and using them to train a separate model that mimics the behavior of the target model.

Mitigation :

Strong Access Controls
Robust Authentication Mechanisms
Supplier Management: Carefully track and verify suppliers of LLM components to prevent supply-chain attacks.
Centralized ML Model Inventory or Registry
Network Resource Restrictions
Automated MLOps Deployment
Regular Monitoring and Auditing
Mitigation of Prompt Injection Techniques
Adversarial Robustness Training
Watermarking Framework

Notes, AI

ai security owas llm

This post is licensed under CC BY 4.0 by the author.

OWASP Top 10 for LLM Applications

1) Prompt Injection :

Attack :

Attack :

Mitigation :

2) Insecure Output Handling:

Attack :

Mitigation :

3) Training Data Poisoning :

Attack :

Mitigation :

4) Model Denial of Service (DOS):

Attack :

Mitigation :

5) Supply Chain Vulnerabilities :

Attack :

Attack :

Mitigation :

6) Sensitive Information Disclosure :

Attack :

Attack :

Mitigation :

7) Insecure Plugin Design :

Attack :

Attack :

Mitigation :

8) Excessive Agency :

Attack :

Attack :

Mitigation :

9 ) Overreliance :

Attack :

Attack :

Mitigation :

10) Model Theft :

Attack :

Attack :

Mitigation :

Trending Tags