PDF Toolbox

Expose local PDFs to MCP-compatible agents or run the standalone pdf-reader CLI with deterministic chunking, semantic search, and configurable defaults.

Highlights

FastMCP/STDIO server ready for Cursor, VS Code, Claude, and other MCP clients.
Typer/Click/Rich CLI (pdf-reader) prints JSON for easy piping.
read_pdf – extracts ordered text with page-window controls for quick inspection.
search_pdf – runs semantic similarity search over cached embeddings with custom top_k, score threshold, and chunk parameters.
describe_pdf_sections – emits deterministic chunks for classic RAG flows or, with --mode tables, returns structured tables (bbox, headers, cells) detected straight from the pages.
configure_pdf_defaults – adjusts chunk size/overlap, page windows, and the default embedding model at runtime.
Strict .pdf validation, sandboxed base path, and aggressive caching.

Documentation

Quick install (uv)

Run the MCP server directly

# Run the MCP server directly
uvx --from git+https://github.com/patriciomartinns/pdf-toolbox -- pdf-toolbox --quiet

# Install/run the CLI
uv tool install --from git+https://github.com/patriciomartinns/pdf-toolbox pdf-reader
pdf-reader --help

Note: If you had the old mcp-pdf-reader CLI installed via uv tool install, run uv tool uninstall mcp-pdf-reader before installing pdf-reader to avoid conflicts.

CLI quick tour

Command	Purpose	Example
`pdf-reader read-pdf`	Extract ordered text for a bounded page range.	`pdf-reader read-pdf reports/Q1.pdf --start-page 3 --end-page 5`
`pdf-reader search-pdf`	Run semantic similarity search over cached embeddings.	`pdf-reader search-pdf reports/Q1.pdf "rate limiting" --top-k 8`
`pdf-reader describe-pdf-sections`	List deterministic chunks with offsets for RAG pipelines.	`pdf-reader describe-pdf-sections reports/Q1.pdf --max-chunks 5`
`pdf-reader configure-pdf-defaults`	Update runtime defaults for chunk size/overlap/page window/model.	`pdf-reader configure-pdf-defaults --chunk-size 600 --chunk-overlap 120 --max-pages 10`

Tip: the first search-pdf invocation on a new document downloads the SentenceTransformers model and builds embeddings, so it can take longer once per model/PDF combo. Subsequent searches reuse the cache.

See the docs/ folder for full recipes covering both CLI commands and MCP client configuration. Questions or ideas? Open an issue on github.com/patriciomartinns/pdf-toolbox.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
docs		docs
scripts		scripts
src/pdf_toolbox		src/pdf_toolbox
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

PDF Toolbox

Highlights

Documentation

Quick install (uv)

Run the MCP server directly

CLI quick tour

About

Uh oh!

Releases 4

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

patriciomartinns/pdf-toolbox

Folders and files

Latest commit

History

Repository files navigation

PDF Toolbox

Highlights

Documentation

Quick install (uv)

Run the MCP server directly

CLI quick tour

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Packages