Governed autonomous penetration testing platform powered by Symbiont. An AI engagement controller orchestrates a multi-phase pen test across a curated offensive toolchain where every tool has a different risk profile, every action is Cedar policy-gated, and every finding is evidence-chained.
Penetration testing firms face four persistent problems:
- Scope creep — testers accidentally hit out-of-scope assets
- Evidence chain integrity — tampering risk in findings
- Junior tester supervision — unsupervised high-risk tool usage
- Reporting overhead — 40% of engagement time writing reports
Eight specialized agents execute a PTES-methodology pen test. Every tool invocation passes through Symbiont's ORGA (Observe-Reason-Gate-Act) loop with Cedar policy enforcement:
engagement-controller
├── recon agent → nmap, whois, dig, whatweb, amass
├── enum agent → nikto, gobuster, enum4linux, smbclient, snmpwalk
├── vuln-assess agent → nmap NSE, nuclei, sqlmap (detect), searchsploit
├── exploit agent → hydra, metasploit, sqlmap (exploit) [human-gated]
├── post-exploit agent → impacket, pypykatz, chisel, ligolo [human-gated]
├── reflector agent → distils phase findings into knowledge triples
└── reporter agent → executive, technical, remediation reports
Between phases the controller invokes the bounded reflector agent, which reads the phase's findings and writes subject-predicate-object lessons to a knowledge store. The next phase's agent pulls those lessons via recall_knowledge before planning, so learning flows forward across the engagement without widening any phase agent's tool surface. Cedar's reflector.cedar uses a defensive forbid ... unless whitelist so the reflector can only touch store_knowledge, recall_knowledge, and query_findings — every scan/exploit action is rejected at the gate.
The critical insight: The Gate operates outside LLM influence. An AI plans Metasploit usage; a human approves each exploitation attempt. Cedar policies cannot be bypassed through prompt injection, social engineering, or creative reasoning.
┌─────────────────────────────────────────────────────────┐
│ Engagement Controller │
│ Maintains state · Enforces methodology · Orchestrates│
└───────┬───────┬───────┬───────┬───────┬───────┬─────────┘
│ │ │ │ │ │
┌────▼──┐ ┌─▼───┐ ┌─▼───┐ ┌▼────┐ ┌▼────┐ ┌▼────────┐
│ Recon │ │Enum │ │Vuln │ │Expl.│ │Post │ │Reporter │
│ │ │ │ │ │ │ │ │Expl.│ │ │
└───┬───┘ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └────┬────┘
│ │ │ │ │ │
┌───▼────────▼───────▼───────▼───────▼──────────▼─────┐
│ ToolClad Manifests (19 .clad.toml) │
│ Typed args · MCP schema · Evidence · Cedar metadata │
├─────────────────────────────────────────────────────┤
│ MCP Tool Layer (35 tools) │
│ Rust implementations · Cedar-gated · Audit-logged │
├─────────────────────────────────────────────────────┤
│ Shell Wrappers (19 scripts) │
│ Arg validation · Timeout · JSON output · Defense │
├─────────────────────────────────────────────────────┤
│ Offensive Toolchain (Kali) │
│ nmap · nikto · nuclei · sqlmap · hydra · metasploit│
│ impacket · pypykatz · chisel · ligolo · gobuster │
└─────────────────────────────────────────────────────┘
| Risk Level | Tools | Authorization |
|---|---|---|
| Low | nmap, whois, dig, whatweb, amass | Auto-allowed within scope |
| Medium | nikto, gobuster, enum4linux, smbclient, snmpwalk | Rate-limited |
| Medium-High | nmap NSE, nuclei, sqlmap (detect), searchsploit | Non-production only |
| High | hydra, metasploit, sqlmap (exploit) | Human approval required |
| Highest | impacket, pypykatz, chisel, ligolo | Human approval + scope revalidation |
Eight policy files enforce governance at every level:
| Policy | Purpose |
|---|---|
scope.cedar |
Target CIDR enforcement, excluded assets |
tool-authorization.cedar |
Per-tool risk-tiered authorization |
phase-gates.cedar |
PTES methodology enforcement |
rate-limits.cedar |
Per-target and global frequency limits |
escalation.cedar |
Human approval with time-limited expiry |
evidence.cedar |
Evidence chain integrity requirements |
time-bounds.cedar |
Engagement window enforcement |
reflector.cedar |
Bounds the reflector to store_knowledge / recall_knowledge / query_findings via defensive forbid ... unless |
SQLite stores structured engagement data: findings, tool runs, retests, and reflector-authored knowledge triples.
LanceDB provides semantic search across findings for cross-tool correlation and retest comparison. A service that moved from port 8080 to 8443 still gets matched. A finding described differently by a different scanner still gets correlated.
Knowledge store — a knowledge table of subject-predicate-object triples written exclusively by the reflector (e.g. (smb_null_session, enabled_on, 10.0.2.15:445, confidence=0.9)). Phase agents read it via recall_knowledge at phase entry to bias their plan. The triple shape keeps lessons concrete and small enough to inject into the next phase's prompt without token bloat. Pattern borrowed from symbiont-karpathy-loop.
Evidence store archives all tool outputs with SHA-256 integrity hashing, creating a tamper-evident chain from discovery through reporting.
- Docker
- An Anthropic API key
# Pull from GitHub Container Registry
docker pull ghcr.io/thirdkeyai/symbi-redteam:latest
# Set required environment variables
export ANTHROPIC_API_KEY=your-key
export SYMBIONT_MASTER_KEY=$(openssl rand -hex 32)
# Start the runtime
docker run --rm --network host --privileged \
-e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
-e SYMBIONT_API_TOKEN="your-api-token" \
-e SYMBIONT_MASTER_KEY="$SYMBIONT_MASTER_KEY" \
ghcr.io/thirdkeyai/symbi-redteam:latest \
up -p 9080 --http-port 9081 --http.token "your-webhook-token"To build locally (e.g., to customize agents, policies, or tools):
# Clone the repo
git clone https://github.com/ThirdKeyAI/symbi-redteam.git
cd symbi-redteam
# Build the container (first build ~15 min for Rust compilation)
docker compose build
# Start with local mounts for live editing
docker run --rm --network host --privileged \
-e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
-e SYMBIONT_API_TOKEN="your-api-token" \
-e SYMBIONT_MASTER_KEY="$SYMBIONT_MASTER_KEY" \
-v ./policies:/app/policies:ro \
-v ./scope:/app/scope:ro \
-v ./agents:/app/agents:ro \
-v ./scripts:/app/scripts \
-v ./templates:/app/templates:ro \
symbi-redteam:latest \
up -p 9080 --http-port 9081 --http.token "your-webhook-token"# Health check
curl -s http://localhost:9080/api/v1/health
# List loaded agents (8 agents from agents/ directory)
curl -s -H "Authorization: Bearer your-api-token" \
http://localhost:9080/api/v1/agents
# Execute an agent
curl -s -X POST -H "Authorization: Bearer your-api-token" \
-H "Content-Type: application/json" \
http://localhost:9080/api/v1/agents/{agent-id}/execute \
-d '{"input": "Scan 10.0.1.0/24 for open services"}'
# Swagger API docs
open http://localhost:9080/swagger-ui/Tool wrappers can be tested directly inside the container without the full runtime:
docker run --rm --network host --privileged --user root \
--entrypoint bash symbi-redteam:latest -c \
'/app/scripts/tool-wrappers/nmap-wrapper.sh 10.0.1.5 service "" test-001'Edit scope/scope.toml to define your engagement targets and update policies/scope.cedar to match. The scope is baked into Cedar policies for this demo.
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Yes | API key for LLM reasoning |
SYMBIONT_API_TOKEN |
Yes | Bearer token for the runtime REST API (port 9080) |
SYMBIONT_MASTER_KEY |
Yes | 256-bit hex key for encryption (openssl rand -hex 32) |
SYMBI_LOG_LEVEL |
No | Log level: debug, info, warn, error (default: info) |
SLACK_BOT_TOKEN |
If approvals enabled | Slack bot token (xoxb-…) for chat.postMessage / chat.update |
SLACK_SIGNING_SECRET |
If approvals enabled | Slack app signing secret for webhook signature verification |
| Port | Purpose | Authentication |
|---|---|---|
| 9080 | Runtime REST API (agents, status, execute) | SYMBIONT_API_TOKEN via Bearer header |
| 9081 | HTTP Input webhook (agent invocation) | --http.token via Bearer header |
| 9082 | Slack approvals webhook (block_actions callbacks) | Slack signing secret |
| 4317 | OTLP gRPC (Jaeger trace collector) | None (local only) |
| 16686 | Jaeger UI | None (local only) |
Every tool invocation is logged to .symbiont/audit/ as JSONL with SHA-256 hash chaining (configured in symbi.toml). In Docker, these are persisted to the host via the audit-logs/ volume mount:
# View recent audit entries
cat audit-logs/*.jsonl | jq .
# Filter by tool name
cat audit-logs/*.jsonl | jq 'select(.tool == "nmap_scan")'
# Filter by Cedar decision
cat audit-logs/*.jsonl | jq 'select(.cedar_decision == "deny")'Symbiont 1.10.0+ supports W3C traceparent propagation via OpenTelemetry. Traces show the full ORGA loop per agent (Observe, Reason, Gate, Act) with cross-agent propagation through ask() calls.
1. Start Jaeger:
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
jaegertracing/all-in-one:latest2. Add telemetry config to symbi.toml:
[telemetry]
enabled = true
otlp_endpoint = "http://localhost:4317"3. View traces:
Open http://localhost:16686 and select the symbi-redteam service. Each engagement run produces traces spanning all phase agents, with spans for:
- Agent ORGA loop iterations
- Cedar policy evaluations (permit/deny)
- Tool executions (wrapper invocation + duration)
- Inter-agent
ask()calls (controller → phase agent) - Human approval gates (time-to-approve)
# Increase log detail for debugging
SYMBI_LOG_LEVEL=debug RUST_LOG=symbi=debug,cedar=info- Gobuster requires
--exclude-lengthfor SPA targets (like Juice Shop) that return 200 for all paths. The agent's reasoning phase handles this automatically. - Nuclei downloads templates on first run inside the container. Templates are pre-downloaded during Docker build, but template updates require a rebuild.
- Metasploit first-run initialization takes 30-60 seconds while the framework loads.
- Non-root execution: The container runs as the
symbiuser by default. Tools requiring raw sockets (nmap SYN scans, chisel tunneling) need--cap-add NET_RAW --cap-add NET_ADMINor--privilegedfor testing. - MCP tool registration: ToolClad manifests in
tools/auto-generate MCP schemas viatoolclad schema. The Rust MCP tool definitions insrc/provide the runtime registration layer. The Symbiont runtime's ToolCladExecutor discovers manifests fromtools/and registers them as MCP tools automatically.
When enabled, human-gated tools (exploit, post-exploit) post an Approve/Deny prompt to Slack in addition to the CLI prompt. The first responder wins.
Slack app setup:
- Create a Slack app at https://api.slack.com/apps
- Bot Token Scopes:
chat:write,chat:write.public,im:write - Interactivity & Shortcuts: enable; Request URL =
https://<your-host>:9082/slack/events - Install to workspace; copy Bot Token (
xoxb-…) and Signing Secret - Invite the bot to the approval channel:
/invite @your-bot #symbi-approvals
Configure symbi.toml:
[approvals.slack]
enabled = true
bot_token_env = "SLACK_BOT_TOKEN"
signing_secret_env = "SLACK_SIGNING_SECRET"
channel = "#symbi-approvals"
approvers = ["U01ABC123", "U02DEF456"] # Slack member IDs
dm_approvers = true
events_bind_addr = "0.0.0.0:9082"Run with Slack enabled:
docker run --rm --network host --privileged \
-e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
-e SYMBIONT_API_TOKEN="..." \
-e SYMBIONT_MASTER_KEY="..." \
-e SLACK_BOT_TOKEN="xoxb-..." \
-e SLACK_SIGNING_SECRET="..." \
ghcr.io/thirdkeyai/symbi-redteam:latest \
up -p 9080 --http-port 9081 --http.token "..."v1 limitations:
- Pending approvals are in-memory; on container restart they're lost and the agent re-prompts on retry.
- Approver allowlist is static (Slack
user_ids insymbi.toml). Per-engagement Cedar-mapped approvers are planned for v2. - Slack only. Teams/Mattermost are deferred.
symbi-redteam/
├── agents/ # 8 Symbiont DSL agent definitions
│ ├── engagement-controller.dsl # Orchestrator
│ ├── recon.dsl # Reconnaissance
│ ├── enum.dsl # Enumeration
│ ├── vuln-assess.dsl # Vulnerability assessment
│ ├── exploit.dsl # Exploitation (human-gated)
│ ├── post-exploit.dsl # Post-exploitation (human-gated)
│ ├── reflector.dsl # Post-phase lesson extractor (bounded)
│ └── reporter.dsl # Report generation
├── tools/ # 19 ToolClad manifests (.clad.toml)
├── toolclad.toml # Project-level custom type definitions
├── policies/ # 8 Cedar policy files
├── src/ # Rust MCP tool definitions
│ ├── recon_tools.rs # 5 recon tools + parse + CVE lookup
│ ├── enum_tools.rs # 5 enumeration tools
│ ├── vuln_tools.rs # 4 vulnerability tools
│ ├── exploit_tools.rs # 4 exploitation tools
│ ├── postexploit_tools.rs # 4 post-exploitation tools
│ ├── evidence_tools.rs # 5 evidence management tools
│ ├── knowledge_tools.rs # store_knowledge + recall_knowledge
│ ├── reporting.rs # 4 reporting tools
│ └── db.rs # SQLite + LanceDB layer
├── scripts/
│ ├── tool-wrappers/ # 19 sandboxed tool wrappers
│ └── parse-outputs/ # 9 output parsers
├── scope/ # Engagement scope definition
├── db/ # Database schema
├── templates/ # Report templates
├── Dockerfile # Multi-stage: Rust builder + Kali runtime
├── docker-compose.yml # Security-hardened container config
└── symbi.toml # Symbiont runtime configuration
All 19 offensive tools have declarative ToolClad manifests in tools/. Each .clad.toml defines:
- Typed parameters with validation (scope_target, port, enum, credential_file, msf_options, etc.)
- Cedar metadata for policy evaluation (resource, action, risk_tier, human_approval)
- MCP schema generation — auto-generate
inputSchema/outputSchemafrom manifests - Evidence envelopes with SHA-256 hashing and structured output
Manifests use the executor escape hatch to delegate to existing shell wrappers, preserving defense-in-depth while adding ToolClad's typed validation layer:
Agent fills typed parameters → ToolClad validates → Shell wrapper executes → Evidence envelope
Custom types in toolclad.toml define project-specific enums and constraints:
hydra_service, nmap_scan_type, severity_level, dns_record_type, scan_rate, msf_module_path, impacket_tool
# Validate all tool manifests (symbi tools CLI, v1.10.0+)
symbi tools validate
# Generate MCP schema for a tool
symbi tools schema nmap_scan
# Dry-run a tool
symbi tools test nmap_scan --arg target=10.0.1.5 --arg scan_type=service
# List all discovered tools
symbi tools listKali base image — Provides the offensive toolchain via apt. Larger image but vastly simpler tool installation and dependency management than building from source.
Hierarchical multi-agent — The engagement controller delegates to phase agents via ask(). Only 2 agents are active concurrently (controller + current phase). This maps naturally to PTES methodology and keeps Cedar policies scoped per phase.
Bounded reflector — Cross-phase learning is handled by a single-purpose reflector agent that can only write to the knowledge store. Separating "who learns" from "who acts" means accumulating procedural knowledge never widens any phase agent's tool surface. The forbid ... unless Cedar pattern catches future accidental widening.
Cedar over inline checks — Cedar policies are formally verifiable, updatable without code changes, and evaluated outside LLM influence. The Gate cannot be prompt-injected.
SQLite + LanceDB — Structured data in SQLite for queries, embeddings in LanceDB for semantic search. Single LanceDB collection with type discriminator avoids runtime changes.
Human approval via CLI — Symbiont's HumanCritic suspends the ORGA loop and prompts the operator. Approval tokens have configurable expiry (30-60 minutes) enforced by Cedar.
| Capability | Raw Tools | symbi-redteam |
|---|---|---|
| Scope enforcement | Manual discipline | Cedar policy — automatic |
| Phase methodology | Tester judgment | Policy-gated transitions |
| Tool authorization | Honor system | Risk-tiered Cedar policies |
| Rate limiting | Manual | Automatic per-target + global |
| Human approval | Verbal/email | CLI prompt with timed expiry |
| Evidence integrity | Trust-based | SHA-256 hash chains |
| Audit trail | Manual notes | Cryptographic, tamper-evident |
| Report generation | 40% of engagement time | Automated from evidence DB |
| Retest comparison | Manual analyst work | Semantic matching + delta reports |
| Cross-phase learning | Tester memory | Reflector-written knowledge triples, recalled by next phase |
Apache 2.0 — see LICENSE for details.
