Expose local PDFs to MCP-compatible agents or run the standalone pdf-reader CLI with deterministic chunking, semantic search, and configurable defaults.
- FastMCP/STDIO server ready for Cursor, VS Code, Claude, and other MCP clients.
- Typer/Click/Rich CLI (
pdf-reader) prints JSON for easy piping. read_pdf– extracts ordered text with page-window controls for quick inspection.search_pdf– runs semantic similarity search over cached embeddings with customtop_k, score threshold, and chunk parameters.describe_pdf_sections– emits deterministic chunks for classic RAG flows or, with--mode tables, returns structured tables (bbox, headers, cells) detected straight from the pages.configure_pdf_defaults– adjusts chunk size/overlap, page windows, and the default embedding model at runtime.- Strict
.pdfvalidation, sandboxed base path, and aggressive caching.
# Run the MCP server directly
uvx --from git+https://github.com/patriciomartinns/pdf-toolbox -- pdf-toolbox --quiet
# Install/run the CLI
uv tool install --from git+https://github.com/patriciomartinns/pdf-toolbox pdf-reader
pdf-reader --helpNote: If you had the old
mcp-pdf-readerCLI installed viauv tool install, runuv tool uninstall mcp-pdf-readerbefore installingpdf-readerto avoid conflicts.
| Command | Purpose | Example |
|---|---|---|
pdf-reader read-pdf |
Extract ordered text for a bounded page range. | pdf-reader read-pdf reports/Q1.pdf --start-page 3 --end-page 5 |
pdf-reader search-pdf |
Run semantic similarity search over cached embeddings. | pdf-reader search-pdf reports/Q1.pdf "rate limiting" --top-k 8 |
pdf-reader describe-pdf-sections |
List deterministic chunks with offsets for RAG pipelines. | pdf-reader describe-pdf-sections reports/Q1.pdf --max-chunks 5 |
pdf-reader configure-pdf-defaults |
Update runtime defaults for chunk size/overlap/page window/model. | pdf-reader configure-pdf-defaults --chunk-size 600 --chunk-overlap 120 --max-pages 10 |
Tip: the first
search-pdfinvocation on a new document downloads the SentenceTransformers model and builds embeddings, so it can take longer once per model/PDF combo. Subsequent searches reuse the cache.
See the docs/ folder for full recipes covering both CLI commands and MCP client configuration. Questions or ideas? Open an issue on github.com/patriciomartinns/pdf-toolbox.