AI DLP Proxy

A high-performance, enterprise-grade HTTP/HTTPS Data Loss Prevention (DLP) proxy designed to sanitize sensitive information before it reaches external LLM endpoints.

Demo

📘 Documentation

Full documentation is available at https://fabriziosalmi.github.io/aidlp/ (or locally via npm run docs:dev).

Overview

The AI DLP Proxy acts as a secure gateway, intercepting traffic to LLM providers (like OpenAI, Anthropic) and redacting sensitive data in real-time using a hybrid approach of static rules and Machine Learning models.

Features

Hybrid Redaction Engine: Combines the speed of static keyword matching (FlashText) with the intelligence of NLP models (Presidio/SpaCy) to detect PII, secrets, and custom terms.
SSL/TLS Interception: Full support for HTTPS traffic inspection via mitmproxy core.
High Performance: Asynchronous ML processing with configurable models (en_core_web_sm for speed) ensures minimal latency impact.
Enterprise Observability: Native Prometheus metrics (/metrics) and structured JSON logging for integration with Grafana/Loki.
Fail Closed Security: Requests are strictly blocked (500 Error) if the DLP engine encounters any failure, ensuring no data leakage.
Scalable: Dockerized (Multi-stage build, Python 3.12) and load-tested to handle 1000+ concurrent connections.

Architecture

The system is built on top of mitmproxy's robust core, extended with a custom Python addon (DLPAddon).

Interception: The proxy intercepts HTTP/HTTPS POST requests.
Analysis: The request body is passed to the DLPEngine.
- Static Analysis: Checks against terms.txt for known secrets.
- ML Analysis: Runs Named Entity Recognition (NER) to find PII (Names, Phones, etc.).
Redaction: Sensitive tokens are replaced with [REDACTED].
Forwarding: The sanitized request is sent to the upstream server.
Telemetry: Metrics and logs are emitted asynchronously.

Prerequisites

Python: 3.12 or higher.
Docker: 20.10+ (for containerized deployment).
Memory: Minimum 2GB RAM recommended for ML models.

Installation

Local Setup

Clone the repository:

git clone https://github.com/fabriziosalmi/aidlp.git
cd aidlp

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:

pip install poetry
poetry install
poetry run python -m spacy download en_core_web_lg

Start the proxy:

poetry run python src/cli.py start --port 8080

Docker Deployment

Build and Run:
```
docker-compose up --build -d
```

Verify:

curl -x http://localhost:8080 http://httpbin.org/ip

Configuration

The proxy is configured via config.yaml and terms.txt.

`config.yaml`

proxy:
  port: 8080
  metrics_port: 9090
  ssl_bump: true

dlp:
  static_terms_file: "terms.txt"
  ml_enabled: true
  ml_threshold: 0.5
  nlp_model: "en_core_web_lg" # or "en_core_web_sm" for speed
  entities: ["PERSON", "PHONE_NUMBER"] # Optional: filter specific entities
  secrets_provider:
    type: "file" # or "vault"
    vault:
      url: "http://localhost:8200"
      path: "aidlp/terms"

`terms.txt`

Add one sensitive term per line. The proxy reloads this file automatically on restart (dynamic reload planned).

super_secret_token
internal_db_password

Usage

Configure your HTTP client or environment to use the proxy.

Example (cURL):

curl -x http://localhost:8080 \
     -X POST http://httpbin.org/post \
     -d "My password is super_secret_token"

Output:

{
  "data": "My password is [REDACTED]"
}

Observability

Metrics

Prometheus metrics are available at http://localhost:9090/metrics.

dlp_requests_total: Total requests processed.
dlp_redacted_total: Requests containing sensitive data.
dlp_pii_detected_total: Count of PII entities by type (e.g., EMAIL, PHONE_NUMBER).
dlp_token_usage_total: Estimated token usage (input/output).
dlp_latency_seconds: Histogram of processing time.
dlp_active_connections: Current active connections.

Logs

Logs are output in structured JSON format to stdout, suitable for ingestion by Fluentd/Logstash.

Troubleshooting

"Address already in use"

Cause: Port 8080 or 9090 is occupied.
Fix: Change port or metrics_port in config.yaml.

"Certificate Verify Failed"

Cause: The client does not trust the mitmproxy CA.
Fix: Install ~/.mitmproxy/mitmproxy-ca-cert.pem into your system or browser trust store. For curl, use -k (insecure) for testing.

High Latency

Cause: ML model processing on CPU.
Fix: Ensure you are running on a machine with AVX support. For production, consider GPU acceleration (future support).

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github		.github
docs		docs
src		src
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NEW_ISSUES.md		NEW_ISSUES.md
README.md		README.md
config.yaml		config.yaml
demo_scenario.sh		demo_scenario.sh
docker-compose.yml		docker-compose.yml
load_test.py		load_test.py
poetry.lock		poetry.lock
prometheus.yml		prometheus.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
terms.txt		terms.txt
test_local_setup.py		test_local_setup.py
verify_dlp.py		verify_dlp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI DLP Proxy

Demo

Table of Contents

Overview

Features

Architecture

Prerequisites

Installation

Local Setup

Docker Deployment

Configuration

`config.yaml`

`terms.txt`

Usage

Observability

Metrics

Logs

Troubleshooting

"Address already in use"

"Certificate Verify Failed"

High Latency

Contributing

License

About

Uh oh!

Releases 7

Uh oh!

Contributors

Uh oh!

Languages

License

fabriziosalmi/aidlp

Folders and files

Latest commit

History

Repository files navigation

AI DLP Proxy

Demo

Table of Contents

Overview

Features

Architecture

Prerequisites

Installation

Local Setup

Docker Deployment

Configuration

config.yaml

terms.txt

Usage

Observability

Metrics

Logs

Troubleshooting

"Address already in use"

"Certificate Verify Failed"

High Latency

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Uh oh!

Contributors

Uh oh!

Languages

`config.yaml`

`terms.txt`