In my conversations with customers, I frequently encounter the same critical questions: "How do you secure your LLMs?" and "How do you ensure the quality of the answers?" These concerns highlight a common hesitation among businesses to provide solutions with LLM intelligence to their own customers due to the significant risks. These risks include not only the possibility of incorrect outputs, which can damage a business's reputation and adversely affect their customers, but also security threats such as prompt injection attacks that could lead to the loss of proprietary data or other sensitive information.
Given these concerns, this blog post aims to address the twin challenges of securing LLMs and ensuring their output quality. Leveraging my AWS background, we will explore the capabilities of Amazon Bedrock native capabilities as well as various open-source tools. We will explore strategies to mitigate these risks and enhance the reliability of your LLM-powered applications.
Understanding the risks
As technology evolves, so do the security threats associated with it. The rapid advancement of Large Language Models (LLMs) has brought significant benefits, but it has also introduced new challenges. Ensuring the security and quality of LLM outputs is crucial to prevent potential harm and misuse. Before we dive into the solutions, let’s first understand the key security concerns and quality issues that businesses need to address when offering LLM solutions to their customers.
For example, in 2023, OpenAI faced scrutiny when a bug allowed users to see snippets of other users' chat histories, raising serious data privacy concerns ChatGPT confirms data breach, raising security concerns. Similarly, Google's AI chat tool Bard sparked controversy due to biased and factually incorrect responses during its debut, demonstrating the importance of maintaining output quality Google’s Bard: AI chatbot makes $100bn mistake. In this section, we will explore the various security concerns and quality issues that businesses must address when making LLM solutions available to their customers.
Security Concerns
- Data Privacy: Sensitive data used for training can be inadvertently exposed.LLMs, if not properly managed, can leak information contained in the training data.
- Prompt Injection Attacks: Malicious inputs can be crafted to manipulate the model's behavior, potentially causing it to reveal sensitive information or perform unintended actions.
- Model Theft: Unauthorized access to the model can lead to intellectual property theft, compromising the proprietary techniques and data used in its development.
- Adversarial Attacks: Adversarial inputs designed to deceive the model can cause it to generate harmful or incorrect outputs, impacting the reliability of the LLM.
- Unauthorized Access:Insufficient access controls can result in unauthorized personnel gaining access to the model, data, or infrastructure, leading to potential misuse.
- Inference Attacks: Attackers can infer sensitive attributes about the training data by analyzing the model's outputs, posing a privacy risk.
- Data Poisoning: Attackers can corrupt the training data with malicious content to compromise the integrity of the model's outputs.
- Model Evasion: Attackers may develop techniques to bypass security measures and exploit vulnerabilities in the model.
Quality Issues
- Bias and Fairness: LLMs can perpetuate and amplify biases present in training data, leading to outputs that are unfair or discriminatory.
- Accuracy: Generated content can be factually incorrect or misleading, which is critical to address, especially in domains where incorrect information can have serious consequences.
- Coherence and Relevance: Outputs need to be contextually appropriate and coherent. Inconsistent or irrelevant responses can undermine the usefulness and trustworthiness of the LLM.
- Ethical Considerations: Ensuring that the model's outputs adhere to ethical standards and do not produce harmful, offensive, or inappropriate content.
- Robustness: Ensuring that the model can handle a wide range of inputs without producing errors or undesirable outputs.
- Transparency and Explainability: Making sure that the model’s decision-making process is transparent and its outputs are explainable, which helps in building trust with users.
How to secure and ensure high-quality inputs and outputs for LLMs with Amazon Bedrock Guardrails
Amazon Bedrock provides built-in capabilities to secure and validate Large Language Model (LLM) inputs and outputs through its native guardrails feature. These guardrails are essential for ensuring that LLMs behave according to predefined security, ethical, and operational standards. With Guardrails, you can set up filtering mechanisms to protect against harmful outputs, inappropriate inputs, sensitive information leakage, and more. In this section, we will dive into the details of how you can implement these features using Amazon Bedrock, walking through the setup process with visual aids.
Step 1: Provide Guardrail Details
In the first step, you can define the basic information for your guardrail setup:
Name and Description: Here, you define a name (e.g., "DemoGuardrail") and a description of what the guardrail is designed to do (e.g., "Guardrail for demos").
Messaging for Blocked Prompts: If a prompt is blocked by the guardrail, you can customize the message shown to users, such as “Sorry, the model cannot answer this question.”
KMS Key Selection (Optional): Optionally, you can select a KMS (Key Management Service) key to encrypt sensitive information within this guardrail.
This provides a foundation for guardrail implementation, allowing you to define how the model responds to blocked content.
Step 2: Configure Content Filters
Content filters allow you to detect and block harmful user inputs and model responses across predefined categories like Hate, Insults, Sexual Content, Violence, and Misconduct.
Harmful Categories: For each category, you can adjust the sensitivity to "None", "Low", "Medium", or "High". This flexibility allows you to fine-tune how strictly the model filters content.
Prompt Attacks: Enabling prompt attack filters helps detect and block user inputs attempting to override the system's instructions. You can adjust the sensitivity for prompt attacks to ensure robust protection against injection attacks.
These filters are crucial for preventing harmful or unwanted content from being generated by the model or entered by users.
Step 3: Add Denied Topics
You can define specific topics that should not be discussed by the model, ensuring the LLM doesn’t respond to sensitive or restricted queries.
Denied Topics: You can add up to 30 denied topics (e.g., "Investment"), and provide sample phrases that the model should block related to that topic.
Customization: For each topic, you define a clear explanation and add up to five phrases (e.g., “Where should I invest my money?”) to ensure the model avoids restricted discussions.
This helps prevent the LLM from engaging in specific conversations, such as offering financial or medical advice.
Step 4: Add Word Filters
With word filters, you can further refine the model's behavior by blocking certain words or phrases from being used in inputs and outputs.
Profanity Filter: This built-in filter allows you to block profane words globally across inputs and outputs.
Custom Word List: You can manually add specific words or phrases, upload them from a local file, or use an S3 object. This lets you block specific terminology that may be inappropriate for your use case.
These word filters ensure that sensitive or inappropriate terms do not appear in the LLM's responses or user inputs.
Step 5: Add Sensitive Information Filters
This step focuses on safeguarding personally identifiable information (PII). You can specify which types of PII should be masked or blocked in LLM responses.
PII Types: The system lets you add specific PII types such as Phone Numbers, Email Addresses, and Credit/Debit Card Numbers.
Guardrail Behavior: For each PII type, you can choose to either mask or block it completely, ensuring that sensitive information is not inadvertently exposed.
This ensures robust data protection and compliance with privacy regulations like GDPR.
Step 6: Add Contextual Grounding Check
One of the key features for ensuring output quality is the contextual grounding check. This validates whether model responses are grounded in factual information and are relevant to the user’s query.
Grounding Check: The grounding check ensures that model responses are factually correct and based on provided reference sources.
Relevance Check: This feature validates whether model responses are relevant to the query and blocks outputs that fall below a defined threshold for relevance and accuracy.
These checks are particularly useful in preventing hallucinations, where the model generates incorrect or irrelevant responses.
Step 7: Review and Create Guardrails
After configuring all necessary filters and checks, you can review your setup and create the guardrail. Once activated, the system allows you to immediately test the guardrail directly within the console by entering prompts to see how it blocks or modifies responses according to your settings. Additionally, you can attach the guardrail to your specific use case to ensure it functions correctly in real-world scenarios, providing a seamless way to protect and enhance your LLM application.
Amazon Bedrock provides powerful native capabilties to secure and ensure high-quality outputs for your LLMs, allowing you to protect against harmful content, ensure data privacy, and prevent model hallucinations. By configuring content filters, denied topics, sensitive information filters, and grounding checks, you can fine-tune your model to meet security, ethical, and operational standards. Now, let's explore some open-source solutions that can be implemented to secure your LLM independently of using Amazon Bedrock.
How to secure and ensure high-quality inputs and outputs for LLMs with open-source solutions
When it comes to securing and ensuring high-quality outputs from LLMs, there are numerous open-source solutions available. The choice of solution depends on your specific demands and use case, such as the level of security required, the sensitivity of the data being processed, and the quality standards your business must meet.
In this section, we will focus on two of the most popular and widely adopted open-source tools for securing LLMs and maintaining output quality:
LLM Guard by Protect AI - A robust solution designed to provide security for LLMs, protecting against various input and output vulnerabilities.
DeepEval by Confident AI - A leading tool for evaluating and maintaining the quality of LLM outputs, ensuring accuracy, coherence, and relevance in responses.
Both solutions offer extensive features to enhance the security and quality of your LLM applications. Let's take a closer look at how they work and the benefits they offer.
How to secure your LLM input and output with LLM Guard
LLM Guard, developed by Protect AI, is an open-source solution that acts as a proxy between your application and the LLM, filtering inputs and outputs in real-time. By sitting between your application and the LLM, LLM Guard ensures that sensitive data, inappropriate content, or malicious prompt injections are intercepted and handled before they reach or leave the model. This makes LLM Guard highly adaptable to any environment where LLMs are integrated, allowing seamless deployment without directly modifying your LLM's architecture. The tool can be easily included into existing LLM workflows, serving as a middle layer to secure the entire interaction cycle between users and the model.
> Image source:https://llm-guard.com/
Key features of LLM Guard include:
Input and Output Filtering: LLM Guard applies a wide range of filters, including anonymization, topic banning, regex matching, and more, to ensure that inputs and outputs comply with security protocols.
Prompt Injection Protection: The tool is designed to detect and block prompt injection attempts that could lead to unwanted or harmful behaviors from the LLM.
PII Detection and Redaction: LLM Guard automatically identifies and redacts sensitive information, such as names, phone numbers, and email addresses, ensuring that private data is not exposed in outputs.
Customizable Scanners: LLM Guard allows users to define specific "scanners" that monitor for different types of sensitive or inappropriate content, giving flexibility in controlling the behavior of the LLM.
LLM Guard can be easily integrated into your infrastructure as it functions as a proxy, ensuring that all inputs and outputs go through a comprehensive security check before and after interacting with the LLM. You can find more details on the project’s code and features on the LLM Guard GitHub repository. To test the tool interactively, Protect AI has provided a playground hosted on Hugging Face, where you can try different filters and configurations.
Let’s now walk through how LLM Guard functions using the Hugging Face playground and real-world examples of processing inputs and outputs.
Step 1: Setting Up Input Filters
When configuring LLM Guard, you have the flexibility to apply filters to either prompts (inputs) or outputs, ensuring that both the data being sent to the model and the data generated by the model are secure and compliant. The range of scanners allows for thorough customization, and each scanner can be individually adjusted to meet specific security and compliance needs.
You can activate multiple scanners based on your requirements. Additionally, each scanner offers fine-tuned control, allowing you to modify thresholds, sensitivity, or specific filter behaviors. For example, you can set the strictness of the BanCode scanner or configure the Anonymize scanner to target specific entities such as credit card numbers or email addresses.
Prompt (Input) scanners:
Anonymize, BanCode, BanCompetitors, BanSubstrings, BanTopics, Code, Gibberish, Language, PromptInjection, Regex, Secrets, Sentiment, TokenLimit, Toxicity.
Output scanners:
BanCode, BanCompetitors, BanSubstrings, BanTopics, Bias, Code, Deanonymize, JSON, Language, LanguageSame, MaliciousURLs, NoRefusal, NoRefusalLightResponse, FactualConsistency, Gibberish, Regex, Relevance, Sensitive, Sentiment, Toxicity, URLReachability.
Step 2: Processing and Sanitizing the Prompt
Once the input filters are configured, LLM Guard processes the prompt. In this example, a detailed resume containing PII is passed through the system. The tool identifies and sanitizes the sensitive information, including names, addresses, phone numbers, and employment details, ensuring that the LLM only receives sanitized input.
Step 3: Viewing the Results
After LLM Guard processes the prompt, you can view the sanitized results. In this case, all personal information such as full names, phone numbers, and email addresses have been redacted. The output is clean and complies with privacy standards. Additionally, a detailed breakdown of each filter's performance is provided, indicating whether the input passed or was flagged by each active scanner.
Wrap-UP
LLM Guard offers robust security and content quality control for large language models by acting as a proxy between your application and the LLM. With its extensive range of customizable scanners for both input and output, it provides granular control over what passes through the model, ensuring compliance with privacy, security, and ethical standards.
In addition to its powerful filtering capabilities, LLM Guard integrates seamlessly into existing workflows. It can be deployed as a middleware layer in your AI infrastructure without needing to modify the LLM itself. This proxy-style deployment allows you to enforce security rules and quality checks transparently across various applications using LLMs. Whether you are working with APIs, cloud-native architectures, or on-premise models, LLM Guard can be integrated with minimal friction. It also supports real-time scanning and protection, ensuring your LLMs are secured and monitored continuously.
How to ensure high-quality input and output of your LLM with DeepEval
DeepEval, developed by Confident AI, is an open-source framework that automates the evaluation of LLM responses based on customizable metrics, ensuring high-quality inputs and outputs. It offers various features to measure LLM performance, helping users improve and maintain model reliability across different applications.
Key Features of DeepEval include:
Customizable Metrics: Define specific evaluation metrics, such as relevance, consistency, and correctness, based on your use case.
Automated Test Runs: Automate the evaluation of test cases, providing detailed insights into LLM performance.
Experiments and Hyperparameters: Compare test runs across various hyperparameter settings, allowing for optimal fine-tuning.
Monitoring & Observability: Track LLM performance in real-time, identifying areas for improvement.
Human Feedback Integration: Incorporate human feedback into the evaluation cycle for deeper insights into model behavior.
You can get started with DeepEval by simply downloading it from the GitHub repository. Once installed, you can create custom evaluation metrics, run test cases, and analyze results directly on your system. DeepEval provides flexibility, making it suitable for anyone looking to test LLMs without additional overhead or setup requirements.
While the tool can be used independently, creating an account on Confident AI’s platform offers additional benefits. By registering, you gain access to centralized storage for your test results and the ability to manage multiple experiments in one place. This feature can be particularly useful for teams working on larger projects, where tracking and overseeing various iterations and performance evaluations is critical. Additionally, the platform offers enhanced features like integrated monitoring and real-time evaluations, which can streamline the testing process.
Now, let's dive into how to set up DeepEval, configure a test case, run the test and analyze the output.
Step 1: Install DeepEval
First, make sure DeepEval is installed:
pip install -U deepeval
Step 2: Write a New Health Recommendation Test Case
Let’s create a new test file that evaluates whether the LLM can provide relevant and accurate recommendations for maintaining heart health.
Create a new test file:
touch test_health_recommendations.py
Now, open test_health_recommendations.py and write the following test case:
import pytest
from deepeval import assert_test
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
def test_case():
# Define the relevancy metric
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)
# Define the test case with input, actual output, and retrieval context
test_case = LLMTestCase(
input="What are the best practices for maintaining heart health?",
# Replace this with the actual output generated by your LLM
actual_output="To maintain heart health, it's important to eat a balanced diet, exercise regularly, and avoid smoking.",
# Relevant information from a knowledge source
retrieval_context=[
"Maintaining heart health includes regular physical activity, a healthy diet, quitting smoking, and managing stress.",
"A diet rich in fruits, vegetables, whole grains, and lean proteins is recommended for heart health.",
"Limiting alcohol consumption and regular medical checkups can help monitor heart health."
]
)
# Run the test and evaluate against the relevancy metric
assert_test(test_case, [answer_relevancy_metric])
Step 3: Run the Test
Run the test case using the following command:
deepeval test run test_health_recommendations.py
Breakdown of This Test Case:
- Input:
"What are the best practices for maintaining heart health?"
This is a common health-related question.
- Actual Output:
"To maintain heart health, it's important to eat a balanced diet, exercise regularly, and avoid smoking."
This is the LLM’s response to the query.
- Retrieval Context: This is the relevant information that the LLM can refer to when answering the question:
- "Maintaining heart health includes regular physical activity, a healthy diet, quitting smoking, and managing stress."
- "A diet rich in fruits, vegetables, whole grains, and lean proteins is recommended for heart health."
- "Limiting alcohol consumption and regular medical checkups can help monitor heart health."
- Answer Relevancy Metric: The AnswerRelevancyMetric is used to evaluate how closely the LLM’s output matches the relevant context from a knowledge source. The threshold of 0.5 means the test passes if the relevancy score is 0.5 or higher.
Wrap-Up
DeepEval is an essential tool for maintaining the quality and reliability of LLMs, particularly in high-stakes industries such as healthcare, where the recommendations and outputs provided by AI systems directly impact people's well-being. By leveraging DeepEval, you can rigorously test and evaluate the performance of your LLMs, ensuring that the model's outputs are accurate, relevant, and free from harmful errors or hallucinations.
One of the key advantages of using DeepEval is its comprehensive set of metrics that allow you to assess various aspects of your model’s performance. Whether you’re monitoring the relevancy of answers to the provided context, the fluency of language used, or detecting potential hallucinations (incorrect or unsupported statements), DeepEval provides out-of-the-box solutions to streamline the testing process. In industries like healthcare, financial services, or legal advice, where strict compliance with factual accuracy and safe recommendations is vital, DeepEvals suite of tools helps minimize risks. For instance, the HallucinationMetric identifies cases where the model introduces information not supported by the retrieval context, which is critical when a model is deployed in sensitive environments like hospitals or clinics. The AnswerRelevancyMetric ensures the response aligns well with the relevant information, eliminating misleading or irrelevant answers that could confuse or harm users.
Using DeepEval is straightforward. After installing the package and configuring your environment, you can write test cases using familiar Python frameworks like pytest. With DeepEval, you define inputs, expected outputs, and relevant knowledge sources (retrieval contexts). These components allow you to track the model's ability to retrieve accurate information, detect hallucinations, or evaluate the overall fluency of its responses. The set of available metrics makes it uniquely suited to such critical use cases. By utilizing these metrics, organizations can confidently deploy LLM-based solutions in environments that require the highest level of reliability, ensuring that the outputs provided do not harm users or provide incorrect recommendations.
Final thoughts
When it comes to ensuring high-quality LLM prompts and outputs, accuracy and safety are crucial. Amazon Bedrock Guardrails provides a robust set of native features for securing, managing, and monitoring LLM outputs within AWS, offering governance and real-time protection to prevent harmful or incorrect outputs. However, when further customization is needed, or if there is no dependency on AWS services, open-source solutions like LLM Guard and DeepEval offer a powerful alternative. These tools enable comprehensive testing, evaluation, and real-time monitoring, ensuring accuracy, relevance, and reliability.
To put this into practice, focus on developing clear strategies for continuously monitoring performance, refining models to meet the demands of specific use cases, and implementing thorough testing processes to catch inaccuracies before deployment. Real-time oversight is key, especially when models are in production, to ensure only reliable outputs are delivered. And don't forget the importance of collaboration—bringing in domain experts alongside AI teams to validate outputs can help keep things on track, especially in areas where precision is critical.