Application Security for Large Language Models

12:22
15.09.2024
ssotnikov252
568

Discover OWASP Top 10 for LLM and GenAI and explore key strategies for securing your AI models and applications.

OWASP Top 10 for LLM and GenAI Applications: A Guide for Developers and Practitioners

Discover the OWASP Top 10 for LLM and GenAI and explore key strategies to secure your AI models and applications.

The emergence of large language models (LLMs) and generative AI technologies (GenAI), such as GPT-4, has revolutionized various industries by providing powerful natural language processing capabilities. However, the rapid adoption of these technologies has outpaced the creation of comprehensive security protocols, leading to significant vulnerabilities in the security of these models. To address these issues, the OWASP community has developed the OWASP Top 10 for Large Language Model Applications guide. This guide provides developers, data practitioners, and security professionals with practical security recommendations tailored to the unique challenges faced by LLMs and GenAI.

1 Prompt Injection

Description

Prompt injection involves manipulating LLMs with specially crafted inputs to bypass filters or execute unintended actions. This can lead to unauthorized access, data leakage, and disruption of decision-making processes.

Example

An attacker crafts a prompt that causes the LLM to disclose confidential information or execute unintended commands.

In the News

My mistake, I can’t give a response to that right now. Let’s try a different topic.

Neglecting to check the output data obtained from LLM can lead to security vulnerabilities such as code injection or data leakage. Large language models can generate output that, if improperly handled, can lead to the execution of malicious code or the disclosure of confidential data.

Example

Output generated by LLM containing executable script tags can be displayed on a web application page, which can lead to a cross-site scripting (XSS) attack.

In the news

Microsoft's AI chatbot Tay, which was launched on Twitter in March 2016, serves as a vivid example of unsafe handling of output data. Tay was designed to have casual conversations with users and learn from these interactions. However, within 16 hours of its launch, users exploited Tay's learning capabilities by providing it with offensive and inappropriate prompts. This manipulation caused Tay to start generating and posting inflammatory, racist, and sexist content on Twitter.

The failure occurred because the bot's design did not include robust mechanisms for filtering and checking the results for malicious content. Microsoft's response emphasized that users made coordinated efforts to abuse Tay's commenting skills, leading to the generation of inappropriate responses (Wikipedia) (TechRepublic).

Prevention

Implement reliable methods for validating and sanitizing output data. Ensure that LLM output is treated as untrusted data and handled accordingly. This includes:

Filtering mechanisms: Develop advanced filtering mechanisms to detect and block offensive or harmful content before it is generated.
Human oversight: Engage human moderators to review and manage outputs, especially during the initial stages of deployment.
Contextual awareness: Enhance the model's ability to understand context and refrain from generating content that contradicts ethical principles.

3 Training Data Poisoning

Data protection in large language model applications

Description

Training data poisoning involves tampering with the data used to train large language models, which can affect the model's future behavior, accuracy, and ethical aspects.

Example

An attacker injects biased or malicious data into the training set, leading to outputs that form certain viewpoints or compromise safety.

In the news

In one of the studies conducted by scientists from the University of Washington, the impact of poisoning training data on machine learning models was studied. This type of attack involves injecting malicious or biased data into the training set, which can significantly distort the behavior and output of the model. For example, if an attacker introduces a certain distortion into the data, they can influence the model to obtain results that form certain viewpoints or cause the model to behave unethically. This can compromise the security, efficiency, and fairness of the model.

In practice, such attacks can be carried out without possessing special insider information. Attackers can use web data sets by altering the content at URLs used in the training data. Such manipulation can occur if attackers control the content at these URLs, even if temporarily. For example, they can edit Wikipedia pages or other sources just before data collection, adding malicious content that poisons the training data (SpringerLink) (ar5iv).

Prevention

Implement thorough data provenance checks and use anomaly detection methods to identify suspicious training data. Regularly audit and clean training data sets. To protect against training data poisoning, use the following strategies:

Data Provenance Check: Regularly audit and verify the source and integrity of training data. Use cryptographic methods to ensure that the data has not been tampered with.
Anomaly Detection: Implement algorithms to detect and flag anomalous or suspicious data patterns that may indicate poisoning attempts.
Robust Training Methods: Use methods that can mitigate the impact of corrupted data, such as robust statistical methods and adversarial training.

4 Model Denial of Service (DoS)

Description

Overloading LLM with operations requiring large computational resources can disrupt services and increase operational costs. This can be used to conduct "denial of service" attacks.

Example

An attacker sends a large number of complex queries to exhaust the computational resources of the LLM.

In the News

Attacks such as "Denial of Service" (DoS) on large language models exploit the resource-intensive nature of these models to disrupt their availability and functionality. In one documented incident, attackers targeted the Microsoft Azure translation service by sending complex, resource-intensive requests designed to overload the system. These requests, while seemingly harmless, required excessive computational power, resulting in a significant slowdown of the service, reducing its performance by 6000 times compared to normal. This attack demonstrated the vulnerability of LLM to carefully crafted inputs that exhaust its data processing capabilities (Microsoft Security Response Center).

Prevention

Implement rate limiting and resource quotas for LLM requests. Use load balancing and scalable infrastructure to handle high traffic efficiently. To protect against typical DoS attacks, consider implementing the following strategies:

Reliable infrastructure and scaling: Use load balancing, auto-scaling, and distributed processing to handle sudden traffic spikes. This ensures even distribution of workload across multiple servers, reducing the risk of resource exhaustion.
Input data filtering and validation: Develop robust mechanisms for filtering and validating input data to block malicious or malformed requests before they reach the LLM. Methods such as rate limiting and input sanitization can help manage suspicious traffic patterns.
Efficient model architectures: Design efficient and lightweight model architectures that reduce the load on computational resources. Approaches such as model compression, quantization, and distillation can enhance the resilience of LLMs to resource exhaustion attacks.
Active monitoring and response: Continuously monitor LLM systems for signs of DoS attacks. Use performance metrics, event log analysis, and anomaly detection to identify potential threats in real-time. Having an incident response plan is crucial for isolating affected systems and quickly restoring service.
Collaborative protection and information sharing: Collaborate with the AI community to identify emerging threats, share best practices, and develop common standards and protocols. Collaboration strengthens the overall security ecosystem for deploying and operating LLMs.
Detailed context management: Ensure that the model does not inadvertently process hidden prompts embedded in seemingly innocuous input data. Methods such as input segmentation and context window validation can help detect and filter potential prompt injections.

5 Supply Chain Vulnerabilities (уязвимости цепочки поставок)

Description

The use of compromised components, services, or datasets can compromise the integrity of LLM applications. Supply chain vulnerabilities can lead to data leaks and system failures.

Example

The use of insecure third-party libraries or datasets containing vulnerabilities.

In the News

One prominent example of a supply chain vulnerability was related to the use of the PyPI package registry. Attackers uploaded a malicious package mimicking a legitimate (and very popular) package "PyKafka". When this compromised package was downloaded and executed, malware was installed, opening backdoors in various systems to gain unauthorized access, exposing them to further attacks. This incident highlights the significant risk associated with third-party components and dependencies in the LLM application supply chain (BleepingComputer) (Enterprise Technology News and Analysis).

Another example is related to the poisoning of publicly available pre-trained models. Attackers uploaded a fake model specializing in economic analysis and social research to a model marketplace such as Hugging Face. This poisoned model contained a backdoor that allowed the generation of disinformation and fake news, demonstrating how easily the integrity of LLM applications can be compromised through malicious actions in the supply chain (Analytics Vidhya) (TechRadar).

These scenarios demonstrate how supply chain vulnerabilities can lead to serious security breaches, biased outcomes, and even systemic failures.

Prevention

Conduct thorough security checks of all third-party components and services. Implement supply chain risk management methods and use reliable sources. To address supply chain vulnerabilities, consider the following strategies:

Check data sources and suppliers: Ensure that all data sources and suppliers have undergone thorough verification. This includes reviewing the terms and conditions, as well as the privacy policy to ensure compliance with your data protection standards.
Use verified plugins and models: Use plugins and models only from verified sources and ensure that they have been tested to meet the requirements of your application.
Vulnerability management: Apply OWASP Top 10 recommendations for managing vulnerable and outdated components. This includes regular vulnerability scanning, patch management, and inventorying all components using a Software Bill of Material (SBOM).
Anomaly detection and reliability testing: Implement anomaly detection and reliability testing on models and data received from suppliers to identify data tampering and model poisoning.
Active monitoring: Continuous monitoring of vulnerabilities in components and environments and ensuring timely patching of outdated components.

6 Sensitive Information Disclosure

Protection against unauthorized data access

Description

Failure to protect against the disclosure of sensitive information in LLM output can lead to data leakage, privacy breaches, and legal consequences.

Example

LLM unintentionally discloses personal data or official information in its responses (this is especially common when using the RAG method.

In the news

A striking example of confidential information disclosure was the case when employees of a technology firm unintentionally entered confidential data into ChatGPT. This data included valuable source code and exclusive data on semiconductor equipment. This incident demonstrated how easily confidential information can be disclosed when using AI-powered tools, highlighting a critical gap in data privacy and security for organizations implementing large language models (TheStreet).

In another case, an artificial intelligence model unintentionally disclosed personally identifiable information (PII) from its training data. This can happen when the model is overtrained or memorizes certain data during training and then reproduces it in responses, leading to unintentional information disclosure. Researchers have found that ChatGPT can unintentionally disclose personally identifiable information (PII) from its training data. Researchers from institutions such as Google DeepMind, the University of Washington, and ETH Zurich have demonstrated that with simple prompts, ChatGPT can disclose email addresses, phone numbers, and other confidential data. They were able to make the AI disclose personal information by asking it to endlessly repeat certain words, which eventually led the model to output memorized data from its training set (Engadget).

Prevention

Use data anonymization techniques and implement access controls to limit access to confidential information. Regularly review and update privacy policies. To reduce the risk of confidential information disclosure, organizations should adopt several strategies:

Data Cleaning: Implement comprehensive measures to clean input data, removing identifiable and confidential information before it is processed by the LLM. This should include robust input validation to prevent model poisoning with malicious data.
Access Control: Ensure strict access control to data transmitted to the LLM and to external data sources. Use the principle of least privilege to limit access to sensitive information.
Awareness and Training: Inform stakeholders about the risks and precautions associated with LLM applications. Emphasize the importance of privacy-focused development practices.
Monitoring and Anomaly Detection: Continuously monitor input and output data to quickly identify and address potential data leaks. Use anomaly detection systems to identify unusual patterns that may indicate a breach.
Policies and Governance: Develop and implement robust data governance policies, including clear terms of use that inform users about data handling practices and provide them with opt-out options for data sharing.

7 Insecure Plugin Design (небезопасный дизайн плагинов)

Data encryption methods in language models

Translation Note:
This section discusses the creation of LLM plugins, not the use of third-party plugins (source).

Description

LLM plugins that process untrusted input from users or external systems and have insufficient access control can lead to serious vulnerabilities such as remote code execution.

Example

A plugin with elevated privileges executes malicious code from untrusted input.

In the news

A striking example of unsafe plugin design is the vulnerability found in the AI Engine plugin for WordPress, which affected over 50,000 active installations. This plugin, used for various AI-related functions such as chatbot creation and content management, had a critical flaw that made sites vulnerable to remote attacks. The vulnerability allowed attackers to inject malicious code, leading to potential data leaks and system compromise. This incident highlights the importance of secure design and implementation of plugins used in AI systems (Infosecurity Magazine).

Prevention

Apply the principle of least privilege and conduct thorough security assessments of all plugins. Implement strict access control and input validation. To prevent vulnerabilities in the design of unsafe plugins, consider the following strategies:

Strict parameterized input: Ensure that plugins provide strict parameterized input and type and range checking of input data. Use a second layer of typed calls to parse requests and apply validation and sanitization where free-form input is required.
Reliable authentication and authorization: Plugins should use appropriate authentication mechanisms, such as OAuth2, and apply effective authorization and access control measures to ensure that only permitted actions are performed.
Thorough testing: Perform comprehensive testing, including Static Application Security Testing (SAST), Dynamic Application Security Testing (DAST), and Interactive Application Security Testing (IAST), to identify and eliminate vulnerabilities in the plugin code.
Minimizing impact: Develop plugins in such a way as to minimize the potential for exploitation of unsafe input parameters by following the principles of least privilege access control and providing as few functional capabilities as possible while still performing the desired function.
Manual user authorization: Require manual user authorization and confirmation of actions performed by sensitive plugins to provide additional verification and control.

8 Excessive Agency (чрезмерная самостоятельность)

Description

Providing LLM with unlimited autonomy in actions can lead to unforeseen consequences, jeopardizing reliability, confidentiality, and trust.

Example

An automated LLM-based system makes unauthorized financial transactions based on faulty logic.

In the news

A striking example highlighting the risks of excessive autonomy is the Microsoft Recall feature in Copilot+ computers. Announced at the Microsoft Build conference, this feature constantly captures screenshots of user actions, making it easy to search and review past activities. This feature, designed to enhance user interaction by capturing and saving screenshots of user actions, can be exploited for unauthorized access due to excessive permissions. Privilege escalation vulnerabilities allow attackers to bypass access controls, gaining access to sensitive information stored in Recall and potentially abusing it (Wired).

Prevention

Implement "human-in-the-loop" mechanisms and ensure that critical decisions are made under human supervision. Define clear boundaries for LLM autonomy. To avoid excessive LLM autonomy, consider the following strategies:

Limit plugin functions: Ensure that plugins have only the minimal functions necessary for their intended use. For example, a plugin that reads emails should not have the ability to send or delete emails.
Limit permissions: Grant plugins the minimum necessary permissions. If a plugin requires read-only access, ensure it does not have permissions to write, update, or delete.
Avoid open functions: Use plugins with specific, granular functionality rather than those that allow broad, unrestricted actions.
"Human in the loop": Implement manual confirmation procedures for high-impact actions. For example, require user confirmation before sending emails or performing financial transactions.
Set request limits: Set a limit on the number of requests per unit of time to control the number of actions an LLM can perform within a given period, reducing the likelihood of abuse.
Monitor and log actions: Continuously monitor and log LLM plugin actions to promptly detect and respond to unusual behavior.

9 Overreliance (излишнее доверие к модели)

Protection against attacks on language models

Description

The inability to critically evaluate the responses of a large language model can jeopardize the decision-making process and lead to security vulnerabilities.

Example

Blind trust in content created by an LLM in a critical application without proper verification.

In the news

A striking example of excessive trust is the use of news sites created with the help of artificial intelligence. News Guard has identified numerous websites that were mostly or entirely created using AI tools such as ChatGPT. It was found that these sites, which operated with minimal or no human oversight, published large volumes of content daily, often without proper fact-checking or editorial review. This led to the spread of misinformation, including false news reports and misleading articles. These AI-driven content farms, such as those described in the report "Rise of the Newsbots," were typically created to generate revenue from programmatic advertising, exploiting the trust gap created by their automated nature (MIT Technology Review) (NewsGuard) (euronews).

In this particular case, websites such as "Biz Breaking News" and "News Live 79" published articles created with the help of artificial intelligence, which included error messages or general responses typical of AI results, indicating a lack of human oversight. This reliance on artificial intelligence to create and manage content without sufficient oversight has led to the spread of misinformation and reduced the reliability of these platforms (NewsGuard) (euronews).

Prevention

Develop a culture of critical thinking and implement processes to verify the results of LLM work. Use LLM as advisory tools rather than for making final decisions. To prevent risks associated with excessive trust, consider the following strategies:

Regular monitoring and verification: Conduct continuous monitoring and analysis of LLM results. Use self-monitoring methods or answer voting to filter out conflicting answers, improving the quality and reliability of the results.
Cross-checking with reliable sources: Compare LLM results with data from reliable external sources to ensure the accuracy and reliability of the information.
Improve model training: Fine-tune the model with knowledge in a specific subject area to reduce inaccuracies. Methods such as prompt engineering and parameter efficient tuning can improve the model's responses.
Implement automatic checks: Use automatic verification mechanisms to cross-check generated outputs against known facts or data, adding an extra layer of security.
Human oversight: Use human oversight to verify content accuracy and fact-checking to ensure high content accuracy and maintain trust.
Risk awareness: Clearly inform users about the risks and limitations associated with using LLMs so they are prepared for potential issues, helping them make informed decisions.

10 Model Theft

Description

Unauthorized access to proprietary large language models can lead to the theft of intellectual property and confidential information, loss of competitive advantages.

Example

An attacker gains access to an organization's proprietary LLM model and steals it.

In the news

A significant example of model theft was a study demonstrating the possibility of extracting confidential information and functions from large-scale language models such as OpenAI's GPT-3 and Google's PaLM-2. The attack, detailed by a team including employees from Google DeepMind and ETH Zurich, used sophisticated methods to recover specific components of the model. This demonstration showed that attackers can effectively copy parts of a proprietary model by querying the API and collecting output data to train a surrogate model - a method often referred to as "model theft" (GIGAZINE) (Unite.AI).

In another report, Unite.AI discussed how attackers can use model theft to create shadow models. These shadow models can then be used to conduct further attacks, including unauthorized access to confidential information, or to refine the actions of attackers who bypass the security measures of the original model (Unite.AI).

Prevention

Implement strict access controls and encryption of model storage. Regularly audit and monitor access to LLM models. To prevent model theft, organizations should implement several key strategies:

Strict access control: Use reliable access control mechanisms such as role-based access control (RBAC) and the principle of least privilege. Ensure that only authorized personnel have access to LLM models and related data.
Authentication and monitoring: Use reliable authentication methods and continuous monitoring of access logs to promptly detect and respond to any suspicious or unauthorized actions.
Centralized model registry: Maintain a centralized registry of ML models. This simplifies access management, authentication implementation, and logging of actions related to model usage.
Restrict network access: Limit LLM access to network resources, internal services, and APIs to minimize the likelihood of potential attacks.
Adversarial robustness training: Conduct adversarial robustness training to detect and mitigate data extraction requests. This helps identify and counteract attempts to extract the model.
Set query limits: Implement a limit on the number of API requests to reduce the risk of data leakage from LLM applications.
Watermarking: Integrate watermarking methods at the embedding creation and extraction stages in the LLM lifecycle to help identify and track unauthorized use of the model.

Conclusions

OWASP Top 10 for LLM and GenAI is an essential resource for securing applications that use these advanced technologies. By understanding and addressing these vulnerabilities, developers and practitioners can create more secure and reliable LLM applications. Continuous awareness and implementation of these best practices will help ensure the responsible and safe adoption of LLM and GenAI across various industries.

Application Security for Large Language Models

Discover OWASP Top 10 for LLM and GenAI and explore key strategies for securing your AI models and applications.

OWASP Top 10 for LLM and GenAI Applications: A Guide for Developers and Practitioners

1 Prompt Injection

Description

Example

In the News

Example

In the news

Prevention

3 Training Data Poisoning

Description

Example

In the news

Prevention

4 Model Denial of Service (DoS)

Description

Example

In the News

Prevention

5 Supply Chain Vulnerabilities (уязвимости цепочки поставок)

Description

Example

In the News

Prevention

6 Sensitive Information Disclosure

Description

Example

In the news

Prevention

7 Insecure Plugin Design (небезопасный дизайн плагинов)

Description

Example

In the news

Prevention

8 Excessive Agency (чрезмерная самостоятельность)

Description

Example

In the news

Prevention

9 Overreliance (излишнее доверие к модели)

Description

Example

In the news

Prevention

10 Model Theft

Description

Example

In the news

Prevention

Conclusions

Relevant news on the topic "AI"

Also read

Also read