Forked from ProjectRecon/awesome-ai-agents-security with additional research papers on LLM security and privacy.
A curated list of open-source tools, frameworks, and resources for securing autonomous AI agents.
This list is organized by the security lifecycle of an autonomous agent, covering red teaming, runtime protection, sandboxing, and governance.
Scope: This repo focuses on the security of AI agent systems (tool use, orchestration, sandboxing, identity, etc.), not on attacks/defenses targeting the underlying LLMs themselves (e.g., adversarial examples, model robustness). The Surveys section is an exception, included as background reading.
Similar Projects:
- https://github.com/ydyjya/Awesome-LLM-Safety
- https://github.com/corca-ai/awesome-llm-security
- https://github.com/AmanPriyanshu/Awesome-AI-For-Security
- Agent Firewalls & Gateways (Runtime Protection)
- Red Teaming & Vulnerability Scanners
- Static Analysis & Linters
- Sandboxing & Isolation Environments
- Guardrails & Compliance
- Benchmarks & Datasets
- Identity & Authentication
- Surveys & Systematizations
- Contributing
Tools that sit between the agent and the world to filter traffic, prevent unauthorized tool access, and block prompt injections.
- AgentGateway - A Linux Foundation project providing an AI-native proxy for secure connectivity (A2A & MCP protocols). It adds RBAC, observability, and policy enforcement to agent-tool interactions.
- Envoy AI Gateway - An Envoy-based gateway that manages request traffic to GenAI services, providing a control point for rate limiting and policy enforcement.
Offensive tools to test agents for security flaws, loop conditions, and unauthorized actions.
- Strix - An autonomous AI agent designed for penetration testing. It runs inside a docker sandbox to actively probe applications and generate verified exploit capabilities.
- PyRIT - Microsoft’s open-source red teaming framework for generative AI. It automates multi-turn adversarial attacks to test if an agent can be coerced into harmful behavior.
- Agentic Security - A dedicated vulnerability scanner for agent workflows and LLMs capable of running multi-step jailbreaks and fuzzing attacks against agent logic.
- Garak - The "Nmap for LLMs." A vulnerability scanner that probes models for hallucination, data leakage, and prompt injection susceptibilities.
- A2A Scanner - A scanner by Cisco designed to inspect "Agent-to-Agent" communication protocols for threats, validating agent identities and ensuring compliance with communication specs.
- Cybersecurity AI (CAI) - A framework for building specialized security agents for offensive and defensive operations, often used in CTF (Capture The Flag) scenarios.
Tools to analyze agent configuration and logic code before deployment.
- Agentic Radar - A static analysis tool that visualizes agent workflows (LangGraph, CrewAI, AutoGen). It detects risky tool usage, permission loops, and maps them to known vulnerabilities.
- Agent Bound - A design-time analysis tool that calculates "Agentic Entropy"—a metric to quantify the unpredictability and risk of infinite loops or unconstrained actions in agent architectures.
- Checkov - While primarily for IaC, Checkov includes policies for scanning AI infrastructure and configurations to prevent misconfigurations in deployment.
Secure runtimes to prevent agents from damaging the host system during code execution.
- SandboxAI - An open-source runtime for executing AI-generated code (Python/Shell) in isolated containers with granular permission controls.
- Kubernetes Agent Sandbox - A Kubernetes Native project providing a Sandbox Custom Resource Definition (CRD) to manage isolated, stateful workloads for AI agents.
- Agent-Infra Sandbox - An "All-In-One" sandbox combining Browser, Shell, VSCode, and File System access in a single Docker container, optimized for agentic tasks.
- OpenHands - Formerly OpenDevin, this platform includes a secure runtime environment for autonomous coding agents to operate without accessing the host machine's sensitive files.
Middleware to enforce business logic and safety policies on inputs and outputs.
- NeMo Guardrails - NVIDIA’s toolkit for adding programmable rails to LLM-based apps. It ensures agents stay on topic, avoid jailbreaks, and adhere to defined safety policies.
- Guardrails - A Python framework for validating LLM outputs against structural and semantic rules (e.g., "must return valid JSON," "must not contain PII").
- LiteLLM Guardrails - While known for model proxying, LiteLLM includes built-in guardrail features to filter requests and responses across multiple LLM providers.
Resources to evaluate agent security performance.
- CVE Bench - A benchmark for evaluating an AI agent's ability to exploit real-world web application vulnerabilities (useful for testing defensive agents).
Tools to manage agent identity (non-human identities).
- WSO2 - An identity management solution that treats AI agents as first-class identities, enabling secure authentication and authorization for agent actions.
Academic surveys covering LLM security, privacy threats, and defenses.
- A Survey on LLM Security and Privacy: The Good, the Bad, and the Ugly (Yao et al., 2023) - "Good/Bad/Ugly" trichotomy: LLMs benefiting security vs. offensive misuse vs. inherent vulnerabilities (jailbreaking, hallucination).
- Unique Security and Privacy Threats of Large Language Models (Wang et al., ACM CSUR 2024) - Isolates threats unique to LLMs (prompt injection, training data extraction, model stealing) from general adversarial ML.
- Security and Privacy Challenges of Large Language Models (Das et al., ACM CSUR 2025) - Covers jailbreaking, data poisoning, PII leakage across training and inference; includes domain-specific risks (healthcare, transportation).
- Security Concerns for Large Language Models (Li & Fung, JISA 2025) - Focuses on prompt injection, adversarial attacks, data poisoning, and output manipulation.
- On the Security and Privacy Implications of LLMs: In-Depth Threat Analysis (Ruhländer et al., IEEE iThings 2024) - Structured threat modeling framework for LLM security and privacy implications.
- Risks, Causes, and Mitigations of Widespread Deployments of LLMs (Sakib et al., IEEE AIBThings 2024) - Risk-cause-mitigation triad for deployment risks (hallucination, bias, data leakage).
- On Protecting the Data Privacy of Large Language Models (Yan et al., 2024) - Lifecycle perspective: privacy risks at pre-training (memorization), fine-tuning (overfitting), and inference (membership inference, model inversion).
- Preserving Privacy in Large Language Models (Miranda et al., TMLR 2025) - Systematizes training data extraction attacks (black-box/white-box) and proposes pipeline-wide privacy mechanisms (DP, FL, machine unlearning).
- SoK: The Privacy Paradox of Large Language Models (Shanmugarasa et al., ACM ASIA CCS 2025) - Frames capability-privacy tension; structured mitigation taxonomy for memorization, extraction, and inference attacks.
- Privacy and Security Challenges in Large Language Models (Rathod et al., IEEE CCWC 2025) - Maps technical privacy challenges to regulatory compliance (GDPR, HIPAA).
Contributions are welcome! Please read the contribution guidelines first.
- Fork the project.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.