A FastAPI-based application that implements the Model Context Protocol (MCP) for lead prospecting. The project follows Clean Architecture principles with a clear separation of concerns across domain, application, and infrastructure layers.
The application now includes persistent storage capabilities with PostgreSQL and pgvector integration, allowing leads data to be stored and managed efficiently.
This project implements Clean Architecture (also known as Hexagonal Architecture) with the following layers:
- Domain Layer: Core business entities and logic
- Application Layer: Use cases and API routes
- Infrastructure Layer: External services, APIs, and framework implementations
The application uses a sophisticated three-phase approach for contact enrichment that separates professional information discovery, LinkedIn profile URL discovery, and biographical information gathering:
Uses Perplexity's sonar model via OpenRouter for finding professional contact information:
- Names: Full names of professionals at target companies
- Email addresses: Professional and work emails
- Phone numbers: Direct contact numbers
- Job titles: Current positions and roles
- Professional background: Career information
The Perplexity search deliberately excludes LinkedIn keywords to avoid low-quality LinkedIn-focused results and instead focuses on finding verified contact details from various professional sources.
After contact information is gathered, the DuckDuckGoClient performs targeted LinkedIn profile URL discovery:
Dual Search Strategy:
- Primary Search (Name + Company): Searches for
site:linkedin.com/in "Person Name" "Company Name" - Fallback Search (Title + Company): If the name search yields no results, falls back to
site:linkedin.com/in "Job Title" "Company Name"
Key Features:
- Rate limiting to avoid being blocked (configurable delay between requests)
- URL deduplication and normalization
- Regex-based extraction of LinkedIn profile URLs from HTML
- Query sanitization to prevent injection
After LinkedIn URLs are discovered, a secondary Perplexity search gathers biographical information for each contact:
- Short Description: A concise one-line summary of the contact's professional profile
- Full Bio: A comprehensive biography including career history, achievements, and professional background
This phase enriches the Contact entity with two additional fields:
short_description: Brief professional summary (displayed in contacts list)full_bio: Detailed biographical information (displayed in contact detail view)
Add these environment variables to your .env file to customize the enrichment behavior:
# Perplexity Web Search (Phase 1 & Phase 3)
WEB_SEARCH_MODEL=perplexity/sonar # Model for web search
WEB_SEARCH_TIMEOUT=60.0 # Request timeout in seconds
WEB_SEARCH_CONCURRENT_REQUESTS=5 # Max concurrent search requests
# DuckDuckGo LinkedIn Search (Phase 2)
DUCKDUCKGO_TIMEOUT=30.0 # Request timeout in seconds
DUCKDUCKGO_MAX_RESULTS=10 # Max LinkedIn URLs per search
DUCKDUCKGO_DELAY_BETWEEN_REQUESTS=2.0 # Rate limiting delay in secondsIf you have an existing database, run the following migration to add the bio columns:
psql -d your_database -f /database/migrations/add_contact_bio_columns.sqlOr execute the SQL directly:
ALTER TABLE contacts ADD COLUMN IF NOT EXISTS short_description TEXT;
ALTER TABLE contacts ADD COLUMN IF NOT EXISTS full_bio TEXT;| Component | Location | Purpose |
|---|---|---|
WebSearchClient |
infrastructure/services/enrich_leads_agent/tools/web_search_client.py |
Perplexity-based contact info and bio search |
DuckDuckGoClient |
infrastructure/services/enrich_leads_agent/tools/duckduckgo_client.py |
LinkedIn URL discovery via HTML search |
DuckDuckGoConfig |
config.py |
Configuration for DuckDuckGo settings |
WebSearchConfig |
config.py |
Configuration for Perplexity settings |
EnrichLeadsNodes |
infrastructure/services/enrich_leads_agent/nodes.py |
Orchestrates the three-phase enrichment |
prospectio-api-mcp/
βββ Dockerfile
βββ README.md
βββ curls/
β βββ list.http
βββ database/
β βββ init.sql
βββ docker-compose.yml
βββ glama.json
βββ poetry.lock
βββ prospectio_api_mcp/
β βββ __pycache__/
β βββ application/
β β βββ api/
β β β βββ leads_routes.py
β β β βββ mcp_routes.py
β β β βββ profile_routes.py
β β β βββ __pycache__/
β β βββ use_cases/
β β βββ get_leads.py
β β βββ insert_leads.py
β β βββ profile.py
β β βββ __pycache__/
β βββ config.py
β βββ domain/
β β βββ entities/
β β β βββ company.py
β β β βββ compatibility_score.py
β β β βββ contact.py
β β β βββ job.py
β β β βββ leads.py
β β β βββ leads_result.py
β β β βββ profile.py
β β β βββ work_experience.py
β β β βββ __pycache__/
β β βββ ports/
β β β βββ compatibility_score.py
β β β βββ fetch_leads.py
β β β βββ leads_repository.py
β β β βββ profile_respository.py
β β β βββ __pycache__/
β β βββ prompts/
β β β βββ compatibility_score.md
β β βββ services/
β β βββ prompt_loader.py
β β βββ __pycache__/
β β βββ leads/
β β βββ active_jobs_db.py
β β βββ jsearch.py
β β βββ mantiks.py
β β βββ strategy.py
β βββ infrastructure/
β β βββ api/
β β β βββ client.py
β β β βββ llm_client_factory.py
β β β βββ llm_generic_client.py
β β β βββ __pycache__/
β β βββ dto/
β β β βββ database/
β β β βββ llm/
β β β βββ mantiks/
β β β βββ rapidapi/
β β βββ services/
β β βββ active_jobs_db.py
β β βββ compatibility_score.py
β β βββ jsearch.py
β β βββ leads_database.py
β β βββ mantiks.py
β β βββ profile_database.py
β βββ main.py
β βββ mcp.py
β βββ mcp_routes.py
β βββ __pycache__/
βββ pyproject.toml
βββ pyrightconfig.json
βββ tests/
β βββ ut/
β βββ test_1_profile_use_case.py
β βββ test_active_jobs_db_use_case.py
β βββ test_get_leads_use_case.py
β βββ test_jsearch_use_case.py
β βββ test_mantiks_use_case.py
β βββ __pycache__/
βββ uv.lock
Contact(contact.py): Represents a business contact (name, email, phone, title, linkedin_url, short_description, full_bio)Company(company.py): Represents a company (name, industry, size, location, description)Job(job.py): Represents a job posting (title, description, location, salary, requirements)Leads(leads.py): Aggregates companies, jobs, and contacts for lead dataLeadsResult(leads_result.py): Represents the result of a lead insertion operationProfile(profile.py): Represents a user profile with personal and professional informationWorkExperience(work_experience.py): Represents work experience entries for a profile
CompanyJobsPort(fetch_leads.py): Abstract interface for fetching company jobs from any data sourcefetch_company_jobs(location: str, job_title: list[str]) -> Leads: Abstract method for job search
LeadsRepositoryPort(leads_repository.py): Abstract interface for persisting leads datasave_leads(leads: Leads) -> None: Abstract method for saving leads to storage
ProfileRepositoryPort(profile_respository.py): Abstract interface for profile data management- Profile-related repository operations
CompanyJobsStrategy(strategy.py): Abstract base class for job retrieval strategies- Concrete Strategies: Implementations for each data source:
ActiveJobsDBStrategy,JsearchStrategy,MantiksStrategy
leads_routes.py: Defines FastAPI endpoints for leads managementprofile_routes.py: Defines FastAPI endpoints for profile management
InsertCompanyJobsUseCase(insert_leads.py): Orchestrates the process of retrieving and inserting company jobs from different sources- Accepts a strategy and repository, retrieves leads and persists them to the database
GetLeadsUseCase(get_leads.py): Handles retrieval of leads dataProfileUseCase(profile.py): Manages profile-related operations
BaseApiClient: Async HTTP client for external API calls
- Database DTOs:
base.py,company.py,job.py,contact.py,profile.py,work_experience.py- SQLAlchemy models for persistence - Mantiks DTOs:
company.py,company_response.py,job.py,location.py,salary.py- Data transfer objects for Mantiks API - RapidAPI DTOs:
active_jobs_db.py,jsearch.py- Data transfer objects for RapidAPI services
ActiveJobsDBAPI: Adapter for Active Jobs DB APIJsearchAPI: Adapter for Jsearch APIMantiksAPI: Adapter for Mantiks APILeadsDatabase: PostgreSQL repository implementation for leads persistenceProfileDatabase: PostgreSQL repository implementation for profile management
All API services implement the CompanyJobsPort interface, and the database service implements the LeadsRepositoryPort interface, allowing for easy swapping and extension.
The FastAPI application is configured to:
- Manage Application Lifespan: Handles startup and shutdown events, including MCP session lifecycle.
- Expose Multiple Protocols:
- REST API available at
/rest/v1/ - MCP protocol available at
/prospectio/(implemented inmcp_routes.py)
- REST API available at
- Integrate Routers: Includes leads insertion routes and profile routes for comprehensive lead and profile management via FastAPI's APIRouter.
- Load Configuration: Loads environment-based settings from
config.pyusing Pydantic. - Dependency Injection: Injects service implementations, strategies, and repository into endpoints for clean separation.
- Database Integration: Configures PostgreSQL connection for persistent storage of leads data and profiles.
To run the application, you need to configure your environment variables. This is done using a .env file at the root of the project.
-
Create the
.envfile: Copy the example file.env.exampleto a new file named.env.cp .env.example .env cp .env .env.docker
-
Edit the
.envfile: Open the.envfile and fill in the required values for the following variables:EXPOSE:stdioorhttpMASTER_KEY: Your master key.ALLOWED_ORIGINS: Comma-separated list of allowed origins.MANTIKS_API_URL: The base URL for the Mantiks API.MANTIKS_API_KEY: Your API key for Mantiks.RAPIDAPI_API_KEY: Your API key for RapidAPI.JSEARCH_API_URL: The base URL for the Jsearch API.ACTIVE_JOBS_DB_URL: The base URL for the Active Jobs DB API.DATABASE_URL: PostgreSQL connection string (e.g.,postgresql+asyncpg://user:password@host:port/database)
The application uses Pydantic Settings to load these variables from the .env file (see prospectio_api_mcp/config.py).
- FastAPI (0.115.14): Modern web framework with automatic API documentation
- MCP (1.10.1): Model Context Protocol implementation
- Pydantic (2.10.3): Data validation and serialization
- HTTPX (0.28.1): HTTP client for external API calls
- SQLAlchemy (2.0.41): Database ORM for PostgreSQL integration
- asyncpg (0.30.0): Async PostgreSQL driver
- psycopg (3.2.4): PostgreSQL adapter
- Pytest: Testing framework
- HTTP Request: Client makes a POST request to
/rest/v1/insert/leads/{source}with JSON body containing location and job_title parameters. - Route Handler: The FastAPI route in
application/api/routes.pyreceives the request and extracts parameters. - Strategy Mapping: The handler selects the appropriate strategy (e.g.,
ActiveJobsDBStrategy,JsearchStrategy, etc.) based on the source. - Use Case Execution:
InsertCompanyJobsUseCaseis instantiated with the selected strategy and repository. - Strategy Execution: The use case delegates to the strategy's
execute()method to fetch leads data. - Port Execution: The strategy calls the port's
fetch_company_jobs(location, job_title)method, which is implemented by the infrastructure adapter (e.g.,ActiveJobsDBAPI).
The project includes comprehensive unit tests following pytest best practices and Clean Architecture principles. Tests are located in the tests/ directory and use dependency injection for mocking external services.
tests/
βββ ut/ # Unit tests
βββ test_mantiks_use_case.py # Mantiks strategy tests
βββ test_jsearch_use_case.py # JSearch strategy tests
βββ test_active_jobs_db_use_case.py # Active Jobs DB strategy tests
βββ test_get_leads.py # Get leads use case tests
βββ test_profile.py # Profile use case tests
poetry install# Run all tests
poetry run pytest
# Run with verbose output
poetry run pytest -v# Run Mantiks tests only
poetry run pytest tests/ut/test_mantiks_use_case.py -v
# Run JSearch tests only
poetry run pytest tests/ut/test_jsearch_use_case.py -v
# Run Active Jobs DB tests only
poetry run pytest tests/ut/test_active_jobs_db_use_case.py -v
# Run Get Leads tests only
poetry run pytest tests/ut/test_get_leads.py -v
# Run Profile tests only
poetry run pytest tests/ut/test_profile.py -v# Run a specific test method
poetry run pytest tests/ut/test_mantiks_use_case.py::TestMantiksUseCase::test_get_leads_success -vTests require a .env file for configuration. Copy the example file:
cp .env.example .envThe CI pipeline automatically handles environment setup and database initialization.
Before running the application, make sure you have set up your environment variables as described in the Configuration section.
-
Install Dependencies:
poetry install
-
Run the Application:
poetry run fastapi run prospectio_api_mcp/main.py --reload --port <YOUR_PORT>
The Docker Compose setup includes both the application and PostgreSQL database with pgvector extension.
First build a network for prospectio :
docker network create prospectio-
Build and Run with Docker Compose:
# Build and start the container docker-compose up --build # Or run in background (detached mode) docker-compose up -d --build
-
Stop the Application:
# Stop the container docker-compose down # Stop and remove volumes (if needed) docker-compose down -v
-
View Logs:
# View real-time logs docker-compose logs -f # View logs for specific service docker-compose logs -f prospectio-api-mcp
### Accessing the APIs
Once the application is running (locally or via Docker), you can access:
- **REST API**: `http://localhost:<YOUR_PORT>/rest/v1/insert/leads/{source}`
- `source` can be: mantiks, active_jobs_db, jsearch
- Method: POST with JSON body containing `location` and `job_title` array
- Example: `http://localhost:<YOUR_PORT>/rest/v1/insert/leads/mantiks`
- **API Documentation**: `http://localhost:<YOUR_PORT>/docs`
- **MCP Endpoint**: `http://localhost:<YOUR_PORT>/prospectio/mcp/sse`
# Add to claude
change settings json to match your environment
```json
{
"mcpServers": {
"Prospectio-stdio": {
"command": "<ABSOLUTE_PATH>/uv",
"args": [
"--directory",
"<PROJECT_ABSOLUTE_PATH>",
"run",
"prospectio_api_mcp/main.py"
]
}
}
}
change settings json to match your environment
{
"mcpServers": {
"prospectio-http": {
"httpUrl": "http://localhost:<YOUR_PORT>/prospectio/mcp/sse",
"timeout": 30000
},
"Prospectio-stdio": {
"command": "<ABSOLUTE_PATH>/uv",
"args": [
"--directory",
"<PROJECT_ABSOLUTE_PATH>",
"run",
"prospectio_api_mcp/main.py"
]
}
}
}Built with β€οΈ by the Prospectio Team
