Skip to content

Reliable LLM outputs start with clean context. Deterministic deduplication, compression, and caching for RAG pipelines.

License

Notifications You must be signed in to change notification settings

Siddhant-K-code/distill

Repository files navigation

Distill

CI Release License: AGPL v3 Go Report Card

Build with Ona

Reliable LLM outputs start with clean context.

A reliability layer for LLM context. Deterministic deduplication that removes redundancy before it reaches your model.

Less redundant data. Lower costs. Faster responses. More efficient & deterministic results.

Learn more →

Context sources → Distill → LLM
(RAG, tools, memory, docs)    (reliable outputs)

The Problem

LLM outputs are unreliable because context is polluted. "Garbage in, garbage out."

30-40% of context assembled from multiple sources is semantically redundant. Same information from docs, code, memory, and tools competing for attention. This leads to:

  • Non-deterministic outputs — Same workflow, different results
  • Confused reasoning — Signal diluted by repetition
  • Production failures — Works in demos, breaks at scale

You can't fix unreliable outputs with better prompts. You need to fix the context that goes in.

How It Works

Math, not magic. No LLM calls. Fully deterministic.

Step What it does Benefit
Deduplicate Remove redundant information across sources More reliable outputs
Compress Keep what matters, remove the noise Lower token costs
Summarize Condense older context intelligently Longer sessions
Cache Instant retrieval for repeated patterns Faster responses

Pipeline

Query → Over-fetch (50) → Cluster → Select → MMR Re-rank (8) → LLM
  1. Over-fetch - Retrieve 3-5x more chunks than needed
  2. Cluster - Group semantically similar chunks (agglomerative clustering)
  3. Select - Pick best representative from each cluster
  4. MMR Re-rank - Balance relevance and diversity

Result: Deterministic, diverse context in ~12ms. No LLM calls. Fully auditable.

Installation

Binary (Recommended)

Download from GitHub Releases:

# macOS (Apple Silicon)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*darwin_arm64.tar.gz" | cut -d '"' -f 4) | tar xz

# macOS (Intel)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*darwin_amd64.tar.gz" | cut -d '"' -f 4) | tar xz

# Linux (amd64)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*linux_amd64.tar.gz" | cut -d '"' -f 4) | tar xz

# Linux (arm64)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*linux_arm64.tar.gz" | cut -d '"' -f 4) | tar xz

# Move to PATH
sudo mv distill /usr/local/bin/

Or download directly from the releases page.

Go Install

go install github.com/Siddhant-K-code/distill@latest

Docker

docker pull ghcr.io/siddhant-k-code/distill:latest
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ghcr.io/siddhant-k-code/distill

Build from Source

git clone https://github.com/Siddhant-K-code/distill.git
cd distill
go build -o distill .

Quick Start

1. Standalone API (No Vector DB Required)

Start the API server and send chunks directly:

export OPENAI_API_KEY="your-key"  # For embeddings
distill api --port 8080

Deduplicate chunks:

curl -X POST http://localhost:8080/v1/dedupe \
  -H "Content-Type: application/json" \
  -d '{
    "chunks": [
      {"id": "1", "text": "React is a JavaScript library for building UIs."},
      {"id": "2", "text": "React.js is a JS library for building user interfaces."},
      {"id": "3", "text": "Vue is a progressive framework for building UIs."}
    ]
  }'

Response:

{
  "chunks": [
    {"id": "1", "text": "React is a JavaScript library for building UIs.", "cluster_id": 0},
    {"id": "3", "text": "Vue is a progressive framework for building UIs.", "cluster_id": 1}
  ],
  "stats": {
    "input_count": 3,
    "output_count": 2,
    "reduction_pct": 33,
    "latency_ms": 12
  }
}

With pre-computed embeddings (no OpenAI key needed):

curl -X POST http://localhost:8080/v1/dedupe \
  -H "Content-Type: application/json" \
  -d '{
    "chunks": [
      {"id": "1", "text": "React is...", "embedding": [0.1, 0.2, ...]},
      {"id": "2", "text": "React.js is...", "embedding": [0.11, 0.21, ...]},
      {"id": "3", "text": "Vue is...", "embedding": [0.9, 0.8, ...]}
    ]
  }'

2. With Vector Database

Connect to Pinecone or Qdrant for retrieval + deduplication:

export PINECONE_API_KEY="your-key"
export OPENAI_API_KEY="your-key"

distill serve --index my-index --port 8080

Query with automatic deduplication:

curl -X POST http://localhost:8080/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "how do I reset my password?"}'

3. MCP Integration (AI Assistants)

Works with Claude, Cursor, Amp, and other MCP-compatible assistants:

distill mcp

Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "distill": {
      "command": "/path/to/distill",
      "args": ["mcp"]
    }
  }
}

See mcp/README.md for more configuration options.

CLI Commands

distill api       # Start standalone API server
distill serve     # Start server with vector DB connection
distill mcp       # Start MCP server for AI assistants
distill analyze   # Analyze a file for duplicates
distill sync      # Upload vectors to Pinecone with dedup
distill query     # Test a query from command line

Configuration

Environment Variables

OPENAI_API_KEY      # For text → embedding conversion (see note below)
PINECONE_API_KEY    # For Pinecone backend
QDRANT_URL          # For Qdrant backend (default: localhost:6334)
DISTILL_API_KEYS    # Optional: protect your self-hosted instance (see below)

Protecting Your Self-Hosted Instance

If you're exposing Distill publicly, set DISTILL_API_KEYS to require authentication:

# Generate a random API key
export DISTILL_API_KEYS="sk-$(openssl rand -hex 32)"

# Or multiple keys (comma-separated)
export DISTILL_API_KEYS="sk-key1,sk-key2,sk-key3"

Then include the key in requests:

curl -X POST http://your-server:8080/v1/dedupe \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{"chunks": [...]}'

If DISTILL_API_KEYS is not set, the API is open (suitable for local/internal use).

About OpenAI API Key

When you need it:

  • Sending text chunks without pre-computed embeddings
  • Using text queries with vector database retrieval
  • Using the MCP server with text-based tools

When you DON'T need it:

  • Sending chunks with pre-computed embeddings (include "embedding": [...] in your request)
  • Using Distill purely for clustering/deduplication on existing vectors

What it's used for:

  • Converts text to embeddings using text-embedding-3-small model
  • ~$0.00002 per 1K tokens (very cheap)
  • Embeddings are used only for similarity comparison, never stored

Alternatives:

  • Bring your own embeddings - include "embedding" field in chunks
  • Self-host an embedding model - set EMBEDDING_API_URL to your endpoint

Parameters

Parameter Description Default
--threshold Clustering distance (lower = stricter) 0.15
--lambda MMR balance: 1.0 = relevance, 0.0 = diversity 0.5
--over-fetch-k Chunks to retrieve initially 50
--target-k Chunks to return after dedup 8

Self-Hosting

Docker (Recommended)

Use the pre-built image from GitHub Container Registry:

# Pull and run
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ghcr.io/siddhant-k-code/distill:latest

# Or with a specific version
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ghcr.io/siddhant-k-code/distill:v0.1.0

Docker Compose

# Start Distill + Qdrant (local vector DB)
docker-compose up

Build from Source

docker build -t distill .
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key distill api

Fly.io

fly launch
fly secrets set OPENAI_API_KEY=your-key
fly deploy

Render

Deploy to Render

Or manually:

  1. Connect your GitHub repo
  2. Set environment variables (OPENAI_API_KEY)
  3. Deploy

Railway

Connect your repo and set OPENAI_API_KEY in environment variables.

Architecture

┌─────────────────────────────────────────────────────────┐
│                      Your App                           │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                      Distill                            │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │
│  │ Fetch   │→ │ Cluster │→ │ Select  │→ │  MMR    │    │
│  │  50     │  │   12    │  │   12    │  │   8     │    │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘    │
│       2ms         6ms         <1ms         3ms          │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                       LLM                               │
└─────────────────────────────────────────────────────────┘

Supported Backends

  • Pinecone - Fully supported
  • Qdrant - Fully supported
  • Weaviate - Coming soon

Use Cases

  • Code Assistants - Dedupe context from multiple files/repos
  • RAG Pipelines - Remove redundant chunks before LLM
  • Agent Workflows - Clean up tool outputs + memory + docs
  • Enterprise - Deterministic outputs for compliance

Why not just use an LLM?

LLMs are non-deterministic. Reliability requires deterministic preprocessing.

LLM Compression Distill
Latency ~500ms ~12ms
Cost per call $0.01+ $0.0001
Deterministic No Yes
Lossless No Yes
Auditable No Yes

Use LLMs for reasoning. Use deterministic algorithms for reliability.

Integrations

Works with your existing AI stack:

  • LLM Providers: OpenAI, Anthropic
  • Frameworks: LangChain, LlamaIndex
  • Vector DBs: Pinecone, Qdrant, Weaviate, Chroma, pgvector
  • Tools: Cursor, Lovable, and more

Contributing

Contributions welcome! Please read the contributing guidelines first.

# Run tests
go test ./...

# Build
go build -o distill .

License

AGPL-3.0 - see LICENSE

For commercial licensing, contact: [email protected]

Links

Packages

 
 
 

Contributors 2

  •  
  •