Police Data Intelligence

An agentic AI system for enriching missing data in police shooting databases

Overview

This project builds an agentic pipeline that automatically enriches missing data in two Texas Justice Initiative (TJI) databases through intelligent web search and extraction. The system's core purpose is data augmentation, not analysis.

Datasets:

Civilians-Shot (1,674 records): Police shooting civilians — 57% missing weapon info, 22.5% missing names
Officers-Shot (282 records): Civilians shooting police — 40% missing officer names
Total: 1,956 records needing enrichment

The Problem: TJI volunteers spend 15–30 minutes per record manually searching news articles and extracting details.

The Solution: An agentic AI system that automates the enrichment workflow while keeping humans in the loop, reducing volunteer time by 75%.

Architecture

The system uses 7 nodes orchestrated by a Coordinator in LangGraph:

flowchart TD
    Start([Start]) --> Load

    Load[Load<br/><i>DB → state fields</i>]
    Search[Search<br/><i>Tavily API</i>]
    Validate[Validate<br/><i>date + location + name</i>]
    Synthesize[Synthesize<br/><i>LLM extraction</i>]
    Coord{Coordinator}
    Complete([Complete<br/><i>write JSON</i>])
    Escalate([Escalate<br/><i>human review</i>])

    Load --> Coord
    Coord -- "fields OK" --> Search
    Search --> Coord
    Coord -- "results > 0" --> Validate
    Coord -- "retry: next strategy" --> Search
    Validate --> Coord
    Coord -- "articles valid" --> Synthesize
    Synthesize --> Coord
    Coord -- "fields extracted" --> Complete
    Coord -- "error / max retries / zero extractions" --> Escalate

Each node accepts and returns EnrichmentState (defined in src/agents/state.py). The Coordinator reads current_stage to decide routing; nodes update state fields, graph edges handle transitions.

Pipeline Nodes

Node	Type	Purpose
Load	Deterministic	Reads incident record from PostgreSQL, populates state fields
Search	Deterministic	Constructs query from incident fields, calls Tavily API for news articles
Validate	Rule-based	Checks date proximity (±5 days), location match, and optional name match
Synthesize	LLM-powered	Extracts structured fields from articles, checks cross-article consistency
Coordinator	Rule-based	Gates after each stage — decides retry, proceed, or escalate
Complete	Terminal	Writes enrichment results to JSON
Escalate	Terminal	Writes escalation report to JSON for human review

Search Strategies

The Coordinator implements an escalating retry strategy:

Retry	Strategy	Description
0	`exact_match`	All fields, exact date
1	`temporal_expanded`	Month + year format, keep both names
2	`name_partial`	Drop officer name, keep civilian name + month-year
3	`entity_dropped`	Drop both names, keep location + date range
4	Escalate	Flag for human review

Escalation Triggers

The Coordinator routes to human review when:

Max retries reached without sufficient validated articles
No articles pass validation after all strategies
Synthesize detects conflicts and zero agreed fields if some fields agree while others conflict, the pipeline completes with the agreed fields and flags requires_human_review = True for the conflicts
Synthesize encounters an error

Validation Logic

Articles pass validation using three-tier logic:

Condition	Criteria	Rationale
Has `published_date`	date + location	Standard check
No date, has `civilian_name`	location + name	Compensates for missing date
No date, no name	location only	Last resort fallback

This prevents false positives from articles about different incidents that happen to match on location alone, while still handling Tavily results that lack parsed dates. Aggregation sites (e.g., Wikipedia, fatalencounters.org) and compilation documents (.pdf, .csv) are excluded at search or validation level — see EVALUATION.md — Appendix.

Synthesize Logic

The synthesize node only processes validated articles (those that passed validation), filtering out unrelated articles before extraction.

For each field extracted from validated articles:

Articles agree → add to extracted_fields with confidence level
Articles disagree → add FieldConflict to conflicting_fields
All articles return null → skip (no data, not a conflict)
Articles agree but conflict with database → add to both lists

After synthesize, the Coordinator applies partial completion logic: if any fields were successfully extracted (extracted_fields non-empty), route to COMPLETE — even if conflicts exist on other fields. Only escalate on conflict when zero fields were extracted. Partial completions set requires_human_review = True so conflicts are still surfaced for review.

Before comparing values, the synthesize node normalizes names, race terms, and weapon categories to reduce spurious conflicts (see EVALUATION.md — Fix 2 for details).

Each FieldConflict captures the field name, conflict type (articles_disagree or reference_mismatch), the conflicting values with source URLs, and the database reference value when applicable.

The database is treated as immutable ground truth (official government data).

Quick Start

Prerequisites

Python 3.11+
PostgreSQL with TJI data loaded
Anthropic API key
Tavily API key

Setup

# Install dependencies
pip install -r requirements.txt

# Install the package (enables the `enrich` CLI command)
pip install -e .

# Configure environment
cp .env.example .env
# Edit .env with your API keys and database credentials

Run

# Enrich a single incident
enrich <incident_id> <dataset_type>

# Examples
enrich 10 civilians_shot
enrich 42 officers_shot

# Or without installing the package
python -m src.run 10 civilians_shot

Results are written to output/enrichment/ as pretty-printed JSON files:

civilians_shot_10_complete.json — successful enrichment
civilians_shot_10_escalate.json — flagged for human review

Example Output

Successful enrichment (civilians_shot_792_complete.json)

{
  "incident_id": "792",
  "dataset_type": "civilians_shot",
  "extracted_fields": [
    {
      "field_name": "weapon",
      "value": "Knife (possessed by civilian ...)",
      "confidence": "medium",
      "sources": ["https://www.click2houston.com/news/local/2020/02/19/..."],
      "extraction_method": "llm"
    }
    // ... 6 more fields (time_of_day, circumstance, officer_name, civilian_name, location_detail, outcome)
  ],
  "validation_results": [
    {
      "article": { "url": "...", "title": "Authorities identify the man ..." },
      "date_match": false,
      "location_match": true,
      "victim_name_match": true,
      "passed": true
    }
    // ... 4 more (4 failed, 1 passed)
  ],
  "search_strategy": "entity_dropped",
  "retry_count": 2,
  "outcome_summary": "Enriched 7 fields for incident 792 (civilians_shot)"
}

Escalated for human review (civilians_shot_10_escalate.json)

{
  "incident_id": "10",
  "dataset_type": "civilians_shot",
  "escalation_reason": "conflict",
  "current_stage": "synthesize",
  "search_strategy": "exact_match",
  "retry_count": 0,
  "retrieved_articles": [
    {
      "url": "https://www.nbcdfw.com/...",
      "title": "Officers Shoot Armed Man ..."
    },
    { "url": "https://www.cbsnews.com/...", "title": "Police Kill Suspect ..." }
    // ... 3 more articles
  ],
  "extracted_fields": [
    {
      "field_name": "officer_name",
      "value": "Rob Sherwin",
      "confidence": "high",
      "sources": ["https://www.nbcdfw.com/..."],
      "extraction_method": "llm"
    }
  ],
  "conflicting_fields": [
    {
      "field_name": "civilian_name",
      "conflict_type": "articles_disagree",
      "values": [
        "Gerardo Ramirez",
        "Gerardo Ramirez (plus unrelated names ...)"
      ],
      "sources": [
        ["https://www.nbcdfw.com/..."],
        ["https://www.dallasnews.com/..."]
      ]
    }
    // ... 7 more conflicting fields
  ],
  "outcome_summary": "Escalated incident 10: conflict after 0 retries"
}

Evaluation

The holdout evaluation measures pipeline accuracy by comparing extracted fields against ground truth values already in the database (age, race, weapon, location, time, outcome). These fields exist in the DB but are never seen by the pipeline during enrichment, creating a natural holdout.

python -m src.eval.run_eval civilians_shot --limit 100 --stratified

Holdout results (N=100, civilians-shot):

Metric	Value
Completion rate	70% (70/100)
Escalation rate	30% (30/100)
Reached extraction	71% (71/100)

Field	Coverage	Exact match	Fuzzy match
civilian_age	49%	90%	90%
time_of_day	32%	94%	94%
location_detail	38%	18%	97%
outcome	68%	84%	84%
weapon	50%	79%	79%
civilian_race	17%	65%	65%

Aggregate precision: 72% exact / 84% fuzzy across 245 extracted values. Age and time-of-day are the strongest fields; location is 97% correct by fuzzy match (exact gap is formatting only). Most escalations (97%) are retrieval gaps where no articles were found. Reports are saved to output/eval/.

Adversarial evaluation (N=20, fabricated incidents): 20 fabricated incidents (fake names, real Texas cities/dates) were run through the live pipeline. 19/20 escalated correctly; 1 completed with requires_human_review=True and 6 field conflicts. Zero hallucinations — fabricated names never appeared in extracted fields.

See EVALUATION.md for full methodology, error analysis, fairness metrics, adversarial evaluation, and discussion.

Configuration

Environment variables (see .env.example):

Variable	Default	Description
`ANTHROPIC_API_KEY`	(required)	Anthropic API key
`ANTHROPIC_MODEL`	`claude-sonnet-4-6`	Model for LLM-powered nodes
`TAVILY_API_KEY`	(required)	Tavily API key for news search
`LOG_LEVEL`	`INFO`	Logging level
`ENRICHMENT_OUTPUT_DIR`	`output/enrichment`	Output directory for JSON results
`ENRICHMENT_MAX_SEARCH_RESULTS`	`5`	Max articles per search
`ENRICHMENT_SEARCH_DEPTH`	`advanced`	Tavily search depth
`ENRICHMENT_FUZZY_MATCH_THRESHOLD`	`80`	Min rapidfuzz score for name matching
`ENRICHMENT_DATE_PROXIMITY_DAYS`	`5`	Max days between article and incident

PostgreSQL connection variables (DB_HOST, DB_PORT, etc.) are configured in .env.example and used by the ETL pipeline (data/).

Development

Project Structure

police-data-intelligence/
├── src/
│   ├── agents/
│   │   ├── state.py             # EnrichmentState, Article, FieldExtraction models
│   │   ├── graph.py             # LangGraph wiring, complete/escalate terminal nodes
│   │   ├── coordinate_node.py   # Coordinator gates (search/validate/synthesize checks)
│   │   └── load_node.py         # Load node (PostgreSQL → state)
│   ├── retrieval/
│   │   └── search_node.py       # Search node (Tavily API)
│   ├── validation/
│   │   └── validate_node.py     # Validate node (date/location/name matching)
│   ├── synthesize/
│   │   └── synthesize_node.py   # Synthesize node (LLM extraction + consistency)
│   ├── database/
│   │   └── connection.py        # PostgreSQL connection
│   ├── eval/
│   │   ├── holdout.py           # Holdout evaluation (compare vs DB ground truth)
│   │   └── run_eval.py          # Eval CLI entrypoint
│   ├── config.py                # Settings (pydantic-settings, from env vars)
│   └── run.py                   # CLI entrypoint
├── data/
│   └── etl/                     # ETL pipeline (CSV → PostgreSQL), separate from agents
├── tests/
│   ├── test_load_node.py
│   ├── test_search_node.py
│   ├── test_validate_node.py
│   ├── test_synthesize_node.py
│   ├── test_coordinate_node.py
│   ├── test_graph.py            # Graph wiring + terminal node tests
│   ├── test_run.py
│   ├── test_holdout.py
│   └── ...                      # ETL tests (cleaners, loaders, schemas)
├── output/
│   ├── enrichment/              # Pipeline JSON output
│   └── eval/                    # Holdout evaluation reports
├── .env.example
└── requirements.txt

Commands

# Lint
ruff check src/ tests/

# Test (unit only — no PostgreSQL needed)
pytest tests/ -v -m "not integration"

# Test single module
pytest tests/test_validate_node.py -v

# Integration tests (requires PostgreSQL)
pytest tests/ -v -m "integration"

Testing Patterns

Unit tests mock external dependencies (Tavily, PostgreSQL, LLM)
Integration tests use @pytest.mark.integration
Mock LLM via MagicMock + dependency injection, not @patch
Use model_copy() when fixtures are mutated by functions under test

Performance

Measured across 23 incidents (claude-sonnet-4-6). See EVALUATION.md — Cost and Latency for holdout timing.

Metric	Mean	Range	Note
Total per incident	7.0s	2.3s – 13.5s	~93% is Tavily search
Bottleneck	Search	3–5s per call	Each retry adds one search call
Projected (1,956 seq)	~3.5h	—	~20 min with 10 concurrent workers

Cost

Estimated per-record API cost using Claude Sonnet 4.6 and Tavily advanced search (PAYGO pricing):

Component	Per Record	1,956 Records
Anthropic (LLM)	~$0.11	~$210
Tavily (search)	~$0.04	~$78
Total	~$0.15	~$290

Cost varies with retry count and article length. See EVALUATION.md for methodology.

Responsible AI

This system operates in a sensitive domain (police accountability). Key design principles:

Human-in-the-loop: System never auto-updates the database; humans approve all changes
Transparency: Shows article excerpts, confidence scores, and conflict details
Traceability: Links suggestions to source articles with verbatim quotes
Accuracy over automation: Conservative thresholds, escalation on conflicts
Immutability: Never overwrites official government data without human approval

Roadmap

Built:

ETL pipeline (CSV → PostgreSQL)
7-node LangGraph pipeline with conditional routing and retry strategies
Partial completion on synthesize conflicts (accept agreed fields, flag conflicts)
4-tier search strategy (exact → temporal → name_partial → entity_dropped)
CLI entrypoint for single-incident enrichment
Holdout evaluation framework (precision, coverage against DB ground truth)
N=100 holdout eval: 70% completion, 72% exact / 84% fuzzy precision across 6 fields (age 90%, time 94%, location 97% fuzzy, outcome 84%)
Adversarial evaluation: 20 fabricated incidents, 0 hallucinations

Next:

Batch processing across all records
Evaluation of the officers-shot dataset
Human review UI

License

MIT License — see LICENSE for details.

Acknowledgment

The author appreciates Texas Justice Initiative (TJI) for collecting, analyzing, and publishing criminal justice data in Texas. TJI maintains publicly available databases on officer-involved shootings and deaths in law enforcement custody, making this data accessible to reporters, researchers, policymakers, and the public. The author contributed to TJI's Officer-Involved Shootings in Texas report (covering 2016–2019). This project extends that work using TJI's updated datasets (2014–2024, 1,956 records) to automate the labor-intensive process of enriching incident records with information from news sources.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
data		data
output/adversarial		output/adversarial
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
EVALUATION.md		EVALUATION.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-ci.txt		requirements-ci.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Police Data Intelligence

Table of Contents

Overview

Architecture

Pipeline Nodes

Search Strategies

Escalation Triggers

Validation Logic

Synthesize Logic

Quick Start

Prerequisites

Setup

Run

Example Output

Evaluation

Configuration

Development

Project Structure

Commands

Testing Patterns

Performance

Cost

Responsible AI

Roadmap

License

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Police Data Intelligence

Table of Contents

Overview

Architecture

Pipeline Nodes

Search Strategies

Escalation Triggers

Validation Logic

Synthesize Logic

Quick Start

Prerequisites

Setup

Run

Example Output

Evaluation

Configuration

Development

Project Structure

Commands

Testing Patterns

Performance

Cost

Responsible AI

Roadmap

License

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages