Secure LLM Agent Design Patterns - Code Examples

ReversecLabs / design-patterns-for-securing-llm-agents-code-samples

15 1 Language: Python Updated: 2mo ago

🔒AI security 📚Scientific writing 🛫Loyalty program travel 🎁Crypto airdrops guidance

README

Secure LLM Agent Design Patterns - Code Examples

This repository contains minimal demo implementations of the six security design patterns discussed in the paper "Design Patterns for Securing LLM Agents against Prompt Injections". Each example is a self-contained Chainlit application demonstrating a specific pattern.

The goal is to provide clear, runnable code that showcases how to build more secure and resilient LLM agents by imposing structural constraints on their operation.

Disclaimer
Initial Project Setup
Running the Examples
References and Further Reading

Disclaimer

This code is for educational and demonstration purposes only.

The implementations in this repository are simplified to clearly illustrate the core principles of each security pattern. They are not production-ready and should not be used in a live environment.

Initial Project Setup

Follow these steps to set up your environment to run the examples. All commands should be run from the root of this project directory.

1. Create a Python Virtual Environment

It is highly recommended to use a virtual environment to manage dependencies.

# Create the virtual environment
python3 -m venv venv

# Activate it (on macOS/Linux)
source venv/bin/activate

# Or on Windows
# venv\Scripts\activate

2. Install Dependencies

Install all required Python packages from the requirements.txt file.

pip install -r requirements.txt

3. Configure API Key

The examples use the OpenAI API. You need to provide your API key in a .env file.

Create a file named .env in the root of this project and add your key like this:

OPENAI_API_KEY="sk-..."

Running the Examples

Each numbered subfolder contains the implementation for one security pattern. All chainlit run commands should be executed from the root directory of this project.

Pattern 1: Action-Selector

This pattern restricts the LLM to only selecting a pre-defined tool and its arguments. It cannot generate conversational text for the user, making it resilient to prompt injection.

To run this example, execute the following command:

chainlit run 01_action-selector/app1.py

01_action-selector/app2.py is an extended version of this pattern where the LLM can pass some parameters to the tools, but in a very controlled fashion (in this case only valid order ids for the current user) so to prevent any chance of arbitrary inputs.

Pattern 2: Plan-Then-Execute

This pattern separates an agent's operation into two phases. First, an LLM creates a fixed, immutable plan of action based only on the user's initial prompt. Then, a separate execution process carries out that plan, preventing prompt injections encountered during execution from altering the fundamental control flow.

To run this example, execute the following command:

chainlit run 02_plan-then-execute/app.py

Pattern 3: LLM Map-Reduce

This pattern is used to securely process a batch of untrusted documents. Each document is "mapped" in isolation to a structured, sanitized format by one LLM. Then, a second "reducer" LLM aggregates only the clean, structured data to produce a final result, ensuring an injection in one document cannot affect the others.

To run this example, execute the following command:

chainlit run 03_llm-map-reduce/app.py

Pattern 4: Dual LLM

This pattern uses two distinct LLM roles to create a security firewall. A stateful "Privileged" LLM orchestrates tasks and calls tools but never sees untrusted data. It uses a separate, tool-less "Quarantined" LLM to process any untrusted content. Communication is handled via symbolic variables, ensuring the Privileged LLM's context is never tainted.

To run this example, execute the following command:

chainlit run 04_dual-llm/app.py

Pattern 5: Code-Then-Execute with Provenance

This pattern has an LLM generate a complete Python script that is then executed in a sandboxed interpreter. This example is presented in two versions:

app_v1.py demonstrates the base pattern.
app_v2.py enhances the pattern by introducing a rudimentary provenance tracking system. This system is inspired by the security concepts in the paper by Debenedetti et al. (see references), where data is tagged with its source. A security policy is then enforced at the most critical point—the quarantined_llm tool—to block it from processing data that has been concatenated from multiple different untrusted sources.

To run the version with provenance tracking, execute the following command:

chainlit run 05_code-then-execute/app_v2.py

Pattern 6: Context-Minimization

This pattern defends against injections in the user's prompt by separating request parsing from response generation. A 'retriever' LLM first extracts only the necessary, sanitized information (e.g., a service plan name) from the user's full request. A second 'summarizer' LLM then generates the final answer using a new, clean context that contains only the retrieved data, making it impossible for it to act on the original injection.

To run this example, execute the following command:

chainlit run 06_context-minimization/app.py

References and Further Reading

"Design Patterns for Securing LLM Agents against Prompt Injections"
- Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav Volhejn.
- https://arxiv.org/abs/2506.08837
- This paper introduces the six core design patterns that this repository implements as practical code examples.
"The Dual LLM Pattern for Building AI Assistants That Can Resist Prompt Injection"
- Simon Willison.
- https://simonwillison.net/2023/Apr/25/dual-llm-pattern/
- This blog post served as the inspiration for Pattern 4 (Dual LLM).
"Defeating Prompt Injections by Design"
- Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr.
- https://arxiv.org/abs/2503.18813
- The concept of provenance tracking and data flow policies, demonstrated in the enhanced version of Pattern 5 (Code-Then-Execute), is inspired by the security principles discussed in this paper.

Search

Secure LLM Agent Design Patterns - Code Examples

Report repository

README

Secure LLM Agent Design Patterns - Code Examples

Table of Contents

Disclaimer

Initial Project Setup

1. Create a Python Virtual Environment

2. Install Dependencies

3. Configure API Key

Running the Examples

Pattern 1: Action-Selector

Pattern 2: Plan-Then-Execute

Pattern 3: LLM Map-Reduce

Pattern 4: Dual LLM

Pattern 5: Code-Then-Execute with Provenance

Pattern 6: Context-Minimization

References and Further Reading

Go to section

Search

Secure LLM Agent Design Patterns - Code Examples

README

Secure LLM Agent Design Patterns - Code Examples

Table of Contents

Disclaimer

Initial Project Setup

1. Create a Python Virtual Environment

2. Install Dependencies

3. Configure API Key

Running the Examples

Pattern 1: Action-Selector

Pattern 2: Plan-Then-Execute

Pattern 3: LLM Map-Reduce

Pattern 4: Dual LLM

Pattern 5: Code-Then-Execute with Provenance

Pattern 6: Context-Minimization

References and Further Reading

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: