| title | Getting Started with fastmcp-lro |
|---|---|
| diataxis_type | tutorial |
In this tutorial you will build a Python script that generates a large search
result, offload it to a JSONL file using fastmcp-lro, and inspect the output.
By the end you will understand the three interfaces the library provides:
the @lro_offload decorator, the explicit LROOffloader call, and the
LROContext context manager.
- Python 3.10 or later
jqinstalled (for inspecting JSONL output)- A terminal with
tailavailable (macOS, Linux, or WSL)
A standalone script that simulates a search tool returning 1,000 results. When the response exceeds the character threshold, fastmcp-lro writes the full result set to a JSONL file and returns a compact descriptor with file paths, schemas, jq recipes, and a summary.
Create a virtual environment and install the package.
python -m venv .venv
source .venv/bin/activate
pip install fastmcp-lroVerify the installation:
python -c "import fastmcp_lro; print(fastmcp_lro.__version__)"You should see 0.1.0 (or the current version).
Create a file called search_demo.py. Start with a plain function that
builds a response containing 1,000 search hits. Each hit has an id,
title, score, and snippet field. The combined JSON easily exceeds
50,000 characters.
# search_demo.py -- Step 2
import json
def search(query: str) -> dict:
"""Simulate a search returning 1,000 results."""
results = [
{
"id": f"doc-{i:04d}",
"title": f"Result {i} for '{query}'",
"score": round(1.0 - i * 0.001, 4),
"snippet": f"This is the snippet for document {i}. " * 5,
}
for i in range(1000)
]
return {
"request_id": "req-abc-001",
"query": query,
"total_count": len(results),
"results": results,
}
if __name__ == "__main__":
data = search("python async patterns")
size = len(json.dumps(data))
print(f"Response size: {size:,} characters")
print(f"Number of results: {data['total_count']}")Run it to confirm the size exceeds the default threshold:
python search_demo.pyExpected output (approximate):
Response size: 155,000 characters
Number of results: 1000
The response is well above the 50,000-character default threshold. In a real MCP tool call this entire payload would be serialized into the LLM's context window, wasting tokens on data the model may never read.
An OffloadSection tells the offloader which key in your response dict to
write to JSONL, what schema each line follows, and how to summarize it.
Add this to search_demo.py, replacing the if __name__ block:
# search_demo.py -- Step 3
import json
from fastmcp_lro import OffloadSection
RESULTS_SECTION = OffloadSection(
key="results",
filename_prefix="search-results",
schema={
"id": "string",
"title": "string",
"score": "number",
"snippet": "string",
},
schema_description="One search hit per line. Line 1 is header metadata.",
)
def search(query: str) -> dict:
"""Simulate a search returning 1,000 results."""
results = [
{
"id": f"doc-{i:04d}",
"title": f"Result {i} for '{query}'",
"score": round(1.0 - i * 0.001, 4),
"snippet": f"This is the snippet for document {i}. " * 5,
}
for i in range(1000)
]
return {
"request_id": "req-abc-001",
"query": query,
"total_count": len(results),
"results": results,
}
if __name__ == "__main__":
data = search("python async patterns")
size = len(json.dumps(data))
print(f"Response size: {size:,} characters")
print(f"Section key: {RESULTS_SECTION.key}")
print(f"Schema fields: {list(RESULTS_SECTION.schema.keys())}")Run it again:
python search_demo.pyResponse size: 155,000 characters
Section key: results
Schema fields: ['id', 'title', 'score', 'snippet']
The section defines what to offload. The offloader handles when and how.
The @lro_offload decorator wraps any function that returns a dict. When the
response exceeds the threshold, it writes the declared sections to JSONL and
returns a compact descriptor. Below-threshold responses pass through unchanged.
Replace search_demo.py with the full version:
# search_demo.py -- Step 4
import json
from fastmcp_lro import lro_offload, OffloadSection
RESULTS_SECTION = OffloadSection(
key="results",
filename_prefix="search-results",
schema={
"id": "string",
"title": "string",
"score": "number",
"snippet": "string",
},
schema_description="One search hit per line. Line 1 is header metadata.",
)
@lro_offload(
sections=[RESULTS_SECTION],
inline_keys=["query", "total_count"],
session_id_key="request_id",
)
def search(query: str) -> dict:
"""Simulate a search returning 1,000 results."""
results = [
{
"id": f"doc-{i:04d}",
"title": f"Result {i} for '{query}'",
"score": round(1.0 - i * 0.001, 4),
"snippet": f"This is the snippet for document {i}. " * 5,
}
for i in range(1000)
]
return {
"request_id": "req-abc-001",
"query": query,
"total_count": len(results),
"results": results,
}
if __name__ == "__main__":
result = search("python async patterns")
print(json.dumps(result, indent=2))Three things to note about the decorator arguments:
sections-- the list ofOffloadSectionobjects declaring which keys to offload.inline_keys-- keys copied verbatim into the compact response (queryandtotal_countsurvive offloading so the LLM still sees them).session_id_key-- the key in the returned dict whose value is embedded in filenames. Here"request_id"maps to"req-abc-001".
Run it:
python search_demo.pyYou will see output like this (paths will differ on your machine):
{
"offloaded": true,
"query": "python async patterns",
"total_count": 1000,
"summary": {
"results": {
"count": 1000
}
},
"files": {
"results": "/tmp/mcp-lro/search-results-req-abc-001-20260325T140000Z-a1b2c3d4.jsonl"
},
"schemas": {
"results": {
"description": "One search hit per line. Line 1 is header metadata.",
"line_schema": {
"id": "string",
"title": "string",
"score": "number",
"snippet": "string"
}
}
},
"jq_recipes": {
"list_all_results": "tail -n +2 '{results_file}' | jq '.'",
"count_results": "tail -n +2 '{results_file}' | jq -s 'length'",
"first_10_results": "tail -n +2 '{results_file}' | head -10 | jq '.'"
},
"guidance": "Results offloaded to JSONL (1 section(s): results).\nFiles:\n - results: /tmp/mcp-lro/search-results-req-abc-001-20260325T140000Z-a1b2c3d4.jsonl\n\nUse the jq_recipes above to extract specific data, or read files directly.\nHeader line (line 1) contains metadata; data objects start at line 2."
}The 155,000-character response has been replaced by a compact descriptor under 1,000 characters. The full data lives in the JSONL file.
Copy the file path from the files.results field in the output and use
tail and jq to read it. Line 1 is a metadata header; data starts at
line 2.
# Replace the path with the actual path from your output
FILE="/tmp/mcp-lro/search-results-req-abc-001-20260325T140000Z-a1b2c3d4.jsonl"
# View the header (line 1)
head -1 "$FILE" | jq '.'
# List all data records (skip the header)
tail -n +2 "$FILE" | jq '.' | head -20
# Count records
tail -n +2 "$FILE" | jq -s 'length'
# Filter: results with score above 0.9
tail -n +2 "$FILE" | jq 'select(.score > 0.9)'
# Project specific fields
tail -n +2 "$FILE" | jq '{id, score}'
# Top 5 by score
tail -n +2 "$FILE" | jq -s 'sort_by(-.score) | .[:5] | .[] | {id, score}'The header line looks like this:
{
"type": "mcp_lro",
"content": "results",
"session_id": "req-abc-001",
"count": 1000,
"estimated_tokens": 38750,
"created_at": "2026-03-25T14:00:00+00:00"
}Each subsequent line is a single JSON object:
{"id":"doc-0000","title":"Result 0 for 'python async patterns'","score":1.0,"snippet":"This is the snippet for document 0. This is the snippet for document 0. This is the snippet for document 0. This is the snippet for document 0. This is the snippet for document 0. "}The decorator is convenient, but sometimes you need direct control. The
LROOffloader class exposes the same logic through offload_if_needed().
Create a file called search_explicit.py:
# search_explicit.py
import json
from fastmcp_lro import LROOffloader, OffloadSection
def search(query: str) -> dict:
"""Simulate a search returning 1,000 results."""
results = [
{
"id": f"doc-{i:04d}",
"title": f"Result {i} for '{query}'",
"score": round(1.0 - i * 0.001, 4),
"snippet": f"This is the snippet for document {i}. " * 5,
}
for i in range(1000)
]
return {
"request_id": "req-abc-001",
"query": query,
"total_count": len(results),
"results": results,
}
RESULTS_SECTION = OffloadSection(
key="results",
filename_prefix="search-results",
schema={
"id": "string",
"title": "string",
"score": "number",
"snippet": "string",
},
schema_description="One search hit per line. Line 1 is header metadata.",
)
offloader = LROOffloader(
threshold=50000,
output_dir="/tmp/my-lro-output",
ttl_seconds=3600,
)
if __name__ == "__main__":
data = search("python async patterns")
# Check size before offloading
size = offloader.estimate_size(data)
print(f"Estimated size: {size:,} characters")
print(f"Threshold: {offloader.threshold:,} characters")
print(f"Will offload: {size > offloader.threshold}")
print()
result = offloader.offload_if_needed(
data=data,
sections=[RESULTS_SECTION],
inline_keys=["query", "total_count"],
session_id="req-abc-001",
)
print(json.dumps(result, indent=2))Run it:
python search_explicit.pyEstimated size: 155,000 characters
Threshold: 50,000 characters
Will offload: True
{
"offloaded": true,
"query": "python async patterns",
"total_count": 1000,
...
}
The explicit interface gives you access to estimate_size() for pre-flight
checks and cleanup() for removing expired files:
# Clean up files older than 1 hour
deleted = offloader.cleanup(max_age_seconds=3600)
print(f"Cleaned up {len(deleted)} expired file(s)")The LROContext class wraps the same offload logic in a context manager.
It is useful when the data is assembled incrementally or when you want
offloading to happen automatically on block exit.
# search_context.py
import json
from fastmcp_lro import LROContext, OffloadSection
RESULTS_SECTION = OffloadSection(
key="results",
filename_prefix="search-results",
schema={"id": "string", "title": "string", "score": "number"},
schema_description="One search hit per line.",
)
def search(query: str) -> dict:
results = [
{
"id": f"doc-{i:04d}",
"title": f"Result {i} for '{query}'",
"score": round(1.0 - i * 0.001, 4),
"snippet": f"This is the snippet for document {i}. " * 5,
}
for i in range(1000)
]
return {
"query": query,
"total_count": len(results),
"results": results,
}
if __name__ == "__main__":
ctx = LROContext(
sections=[RESULTS_SECTION],
inline_keys=["query", "total_count"],
session_id="ctx-demo-001",
threshold=50000,
)
with ctx:
data = search("python async patterns")
ctx.set_data(data)
# result is available after the block exits
print(json.dumps(ctx.result, indent=2))LROContext also works as an async context manager (async with) for use
in async code paths.
- OffloadSection declares which response key to offload, its schema, and how to summarize it.
- @lro_offload wraps a function so that large responses are automatically offloaded to JSONL. It works with both sync and async functions.
- LROOffloader.offload_if_needed() provides the same behavior through
an explicit call, giving you access to
estimate_size()andcleanup(). - LROContext wraps the logic in a context manager for incremental data assembly.
- The JSONL output has a header on line 1 (metadata) and data on
lines 2+. Always skip the header with
tail -n +2when processing. - The compact descriptor includes jq_recipes, schemas, and a summary so the consuming LLM can extract exactly what it needs without reading the entire file.
- Integrating LRO into a FastMCP Server --
apply LRO to a real MCP server with
server_instructions(). - Configuration Reference -- threshold tuning, environment variable overrides, and TTL settings.