The open-source alternative to Apollo, ZoomInfo, and Clearbit. Enriches business databases using free data sources and local AI.
Coverage is measured with branch coverage enabled (every conditional path counts). 46% branch-aware is roughly equivalent to 67% statement-only on the same 600-test suite.
pip install forge-enrichment
# enrich a CSV (no config needed)
forge enrich --file leads.csv --output enriched.csv
# or start the dashboard
forge dashboard
# or discover businesses by ZIP
forge discover --zip 33602 --enrich --output tampa.csvNo API keys, no accounts, no credit cards. Runs on your machine.
Point FORGE at a PostgreSQL table of businesses and it fills in the blanks: emails, tech stacks, SSL status, site speed, industry classification, health scores, and pain points. It scrapes websites, imports government records, verifies emails via SMTP, and optionally generates AI summaries with a local model.
Apollo, ZoomInfo, and Clearbit charge $10K-50K/year for data that's largely scraped from public sources. FORGE does the same thing for free.
Email extraction (6 layers): mailto links, regex, Cloudflare decode, JSON-LD, obfuscation decode, contact page crawl.
SMTP verification: generates candidates (info@, contact@, admin@), verifies via RCPT TO without sending mail.
Tech detection (30+): WordPress, Shopify, React, Next.js, Stripe, Intercom, Google Analytics, Tailwind, and more.
Government data importers:
- FCC ULS -- telecom license holders (~3M records)
- NPI Registry -- healthcare providers (~7M records, free API)
- SAM.gov -- federal contractors (~900K entities, free API key)
Local AI enrichment (via Ollama): business summaries, industry classification, health scores, pain points. No API costs.
Quality checks: Haiku audit agent for spot-checking, field validation, COALESCE writes (never clobber existing data), rollback support.
Ops: checkpoint-based resume, hourly watchdog, graceful degradation when individual layers fail.
forge/
core/ Agent loop, tool registry, context management, output parsing
adapters/ LLM backends (Ollama local, Claude for audit)
tools/ Database pool + async web scraper
enrichment/ Pipeline orchestration + prompt templates
safety/ Audit agent + error recovery + rollback
importers/ FCC ULS, NPI Registry, SAM.gov, SMTP verifier
monitor.py Process health monitoring + auto-restart
Two parallel tracks write to the same table:
- Web scraping track -- emails, tech stack, SSL, CMS, site speed. No LLM needed. ~16K records/hr with 4 workers.
- AI track -- summaries, industry, health score, pain points. Runs Gemma via Ollama. ~7-20K/day depending on hardware.
Both use COALESCE writes (only update NULL fields) and checkpoint after every batch.
git clone https://github.com/ghealysr/forge.git && cd forge
pip install -e .Create a PostgreSQL database and table:
CREATE TABLE businesses (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT, phone TEXT, email TEXT, website_url TEXT,
city TEXT, state TEXT, zip TEXT,
industry TEXT, sub_industry TEXT,
tech_stack TEXT[], cms_detected TEXT,
ssl_valid BOOLEAN, site_speed_ms INTEGER,
ai_summary TEXT, health_score SMALLINT, pain_points TEXT[],
npi_number TEXT, email_source TEXT,
last_enriched_at TIMESTAMPTZ,
enrichment_attempts INTEGER DEFAULT 0,
updated_at TIMESTAMPTZ DEFAULT now()
);cp .env.example .env
# fill in your DB credentialsOptional: install Ollama for local AI.
brew install ollama && ollama pull gemma4:26b # macOSRun:
forge enrich --mode email --workers 50 --resume # web scraping
forge enrich --mode ai # AI enrichment
forge enrich --mode both --workers 50 # both| Variable | Required | Default | What it does |
|---|---|---|---|
FORGE_DB_HOST |
Yes | localhost |
PostgreSQL host |
FORGE_DB_PORT |
Yes | 5432 |
PostgreSQL port |
FORGE_DB_USER |
Yes | forge |
PostgreSQL user |
FORGE_DB_PASSWORD |
Yes | -- | PostgreSQL password |
FORGE_DB_NAME |
Yes | forge |
Database name |
ANTHROPIC_API_KEY |
No | -- | For Haiku audit agent only |
SAM_GOV_API_KEY |
No | -- | Free key from api.data.gov |
FORGE_SMTP_FROM |
No | verify@yourdomain.com |
SMTP verification identity |
FORGE_SMTP_EHLO |
No | yourdomain.com |
EHLO domain |
# web scraping only
forge enrich --mode email --workers 100
# AI only
forge enrich --mode ai
# both tracks
forge enrich --mode both --workers 50
# government data imports
python -m forge.importers.fcc_uls --data-dir data/fcc
python -m forge.importers.npi_registry --all-states
python -m forge.importers.sam_gov --api-key YOUR_KEY
python -m forge.importers.smtp_verifier --workers 10
# monitoring
python -m forge.monitorAdd to ~/.claude.json:
{
"mcpServers": {
"forge": {
"command": "forge",
"args": ["mcp-server"]
}
}
}Available tools:
| Tool | Description |
|---|---|
forge_discover |
Find businesses by ZIP via Overture Maps |
forge_enrich_record |
Add and enrich a single record |
forge_stats |
Database statistics |
forge_search |
Search by name, state, industry |
forge_export |
Export results to CSV |
| FORGE | Apollo | ZoomInfo | Clearbit | |
|---|---|---|---|---|
| Price | Free | $49-119/mo | $14,995/yr | Custom |
| Local AI | Yes | No | No | No |
| Government data | Yes | No | No | No |
| Self-hosted | Yes | No | No | No |
| Open source | Yes | No | No | No |
| SMTP verification | Yes | Yes | Yes | No |
| Tech detection | Yes | No | Yes | Yes |
| CSV in/out | Yes | Yes | Yes | No |
| Dashboard | Yes | Yes | Yes | No |
| MCP support | Yes | No | No | No |
- Fork and branch
- Make changes, add tests (
pytest forge/tests/) - Lint (
ruff check .) - PR
We could use help with: new tech detections, government data importers, alternative LLM backends, and scraping performance.
MIT. See LICENSE.
Built by Nuclear Marmalade. If it saves you money, star the repo.