Skip to content

Latest commit

 

History

History
500 lines (397 loc) · 13.4 KB

File metadata and controls

500 lines (397 loc) · 13.4 KB
title Getting Started with fastmcp-lro
diataxis_type tutorial

Getting Started with fastmcp-lro

In this tutorial you will build a Python script that generates a large search result, offload it to a JSONL file using fastmcp-lro, and inspect the output. By the end you will understand the three interfaces the library provides: the @lro_offload decorator, the explicit LROOffloader call, and the LROContext context manager.

Prerequisites

  • Python 3.10 or later
  • jq installed (for inspecting JSONL output)
  • A terminal with tail available (macOS, Linux, or WSL)

What you will build

A standalone script that simulates a search tool returning 1,000 results. When the response exceeds the character threshold, fastmcp-lro writes the full result set to a JSONL file and returns a compact descriptor with file paths, schemas, jq recipes, and a summary.

Step 1 -- Install fastmcp-lro

Create a virtual environment and install the package.

python -m venv .venv
source .venv/bin/activate
pip install fastmcp-lro

Verify the installation:

python -c "import fastmcp_lro; print(fastmcp_lro.__version__)"

You should see 0.1.0 (or the current version).

Step 2 -- Write a function that returns a large dict

Create a file called search_demo.py. Start with a plain function that builds a response containing 1,000 search hits. Each hit has an id, title, score, and snippet field. The combined JSON easily exceeds 50,000 characters.

# search_demo.py  --  Step 2
import json


def search(query: str) -> dict:
    """Simulate a search returning 1,000 results."""
    results = [
        {
            "id": f"doc-{i:04d}",
            "title": f"Result {i} for '{query}'",
            "score": round(1.0 - i * 0.001, 4),
            "snippet": f"This is the snippet for document {i}. " * 5,
        }
        for i in range(1000)
    ]
    return {
        "request_id": "req-abc-001",
        "query": query,
        "total_count": len(results),
        "results": results,
    }


if __name__ == "__main__":
    data = search("python async patterns")
    size = len(json.dumps(data))
    print(f"Response size: {size:,} characters")
    print(f"Number of results: {data['total_count']}")

Run it to confirm the size exceeds the default threshold:

python search_demo.py

Expected output (approximate):

Response size: 155,000 characters
Number of results: 1000

The response is well above the 50,000-character default threshold. In a real MCP tool call this entire payload would be serialized into the LLM's context window, wasting tokens on data the model may never read.

Step 3 -- Define an OffloadSection

An OffloadSection tells the offloader which key in your response dict to write to JSONL, what schema each line follows, and how to summarize it.

Add this to search_demo.py, replacing the if __name__ block:

# search_demo.py  --  Step 3
import json
from fastmcp_lro import OffloadSection


RESULTS_SECTION = OffloadSection(
    key="results",
    filename_prefix="search-results",
    schema={
        "id": "string",
        "title": "string",
        "score": "number",
        "snippet": "string",
    },
    schema_description="One search hit per line. Line 1 is header metadata.",
)


def search(query: str) -> dict:
    """Simulate a search returning 1,000 results."""
    results = [
        {
            "id": f"doc-{i:04d}",
            "title": f"Result {i} for '{query}'",
            "score": round(1.0 - i * 0.001, 4),
            "snippet": f"This is the snippet for document {i}. " * 5,
        }
        for i in range(1000)
    ]
    return {
        "request_id": "req-abc-001",
        "query": query,
        "total_count": len(results),
        "results": results,
    }


if __name__ == "__main__":
    data = search("python async patterns")
    size = len(json.dumps(data))
    print(f"Response size: {size:,} characters")
    print(f"Section key: {RESULTS_SECTION.key}")
    print(f"Schema fields: {list(RESULTS_SECTION.schema.keys())}")

Run it again:

python search_demo.py
Response size: 155,000 characters
Section key: results
Schema fields: ['id', 'title', 'score', 'snippet']

The section defines what to offload. The offloader handles when and how.

Step 4 -- Apply the @lro_offload decorator

The @lro_offload decorator wraps any function that returns a dict. When the response exceeds the threshold, it writes the declared sections to JSONL and returns a compact descriptor. Below-threshold responses pass through unchanged.

Replace search_demo.py with the full version:

# search_demo.py  --  Step 4
import json
from fastmcp_lro import lro_offload, OffloadSection


RESULTS_SECTION = OffloadSection(
    key="results",
    filename_prefix="search-results",
    schema={
        "id": "string",
        "title": "string",
        "score": "number",
        "snippet": "string",
    },
    schema_description="One search hit per line. Line 1 is header metadata.",
)


@lro_offload(
    sections=[RESULTS_SECTION],
    inline_keys=["query", "total_count"],
    session_id_key="request_id",
)
def search(query: str) -> dict:
    """Simulate a search returning 1,000 results."""
    results = [
        {
            "id": f"doc-{i:04d}",
            "title": f"Result {i} for '{query}'",
            "score": round(1.0 - i * 0.001, 4),
            "snippet": f"This is the snippet for document {i}. " * 5,
        }
        for i in range(1000)
    ]
    return {
        "request_id": "req-abc-001",
        "query": query,
        "total_count": len(results),
        "results": results,
    }


if __name__ == "__main__":
    result = search("python async patterns")
    print(json.dumps(result, indent=2))

Three things to note about the decorator arguments:

  • sections -- the list of OffloadSection objects declaring which keys to offload.
  • inline_keys -- keys copied verbatim into the compact response (query and total_count survive offloading so the LLM still sees them).
  • session_id_key -- the key in the returned dict whose value is embedded in filenames. Here "request_id" maps to "req-abc-001".

Run it:

python search_demo.py

You will see output like this (paths will differ on your machine):

{
  "offloaded": true,
  "query": "python async patterns",
  "total_count": 1000,
  "summary": {
    "results": {
      "count": 1000
    }
  },
  "files": {
    "results": "/tmp/mcp-lro/search-results-req-abc-001-20260325T140000Z-a1b2c3d4.jsonl"
  },
  "schemas": {
    "results": {
      "description": "One search hit per line. Line 1 is header metadata.",
      "line_schema": {
        "id": "string",
        "title": "string",
        "score": "number",
        "snippet": "string"
      }
    }
  },
  "jq_recipes": {
    "list_all_results": "tail -n +2 '{results_file}' | jq '.'",
    "count_results": "tail -n +2 '{results_file}' | jq -s 'length'",
    "first_10_results": "tail -n +2 '{results_file}' | head -10 | jq '.'"
  },
  "guidance": "Results offloaded to JSONL (1 section(s): results).\nFiles:\n  - results: /tmp/mcp-lro/search-results-req-abc-001-20260325T140000Z-a1b2c3d4.jsonl\n\nUse the jq_recipes above to extract specific data, or read files directly.\nHeader line (line 1) contains metadata; data objects start at line 2."
}

The 155,000-character response has been replaced by a compact descriptor under 1,000 characters. The full data lives in the JSONL file.

Step 5 -- Inspect the JSONL file

Copy the file path from the files.results field in the output and use tail and jq to read it. Line 1 is a metadata header; data starts at line 2.

# Replace the path with the actual path from your output
FILE="/tmp/mcp-lro/search-results-req-abc-001-20260325T140000Z-a1b2c3d4.jsonl"

# View the header (line 1)
head -1 "$FILE" | jq '.'

# List all data records (skip the header)
tail -n +2 "$FILE" | jq '.' | head -20

# Count records
tail -n +2 "$FILE" | jq -s 'length'

# Filter: results with score above 0.9
tail -n +2 "$FILE" | jq 'select(.score > 0.9)'

# Project specific fields
tail -n +2 "$FILE" | jq '{id, score}'

# Top 5 by score
tail -n +2 "$FILE" | jq -s 'sort_by(-.score) | .[:5] | .[] | {id, score}'

The header line looks like this:

{
  "type": "mcp_lro",
  "content": "results",
  "session_id": "req-abc-001",
  "count": 1000,
  "estimated_tokens": 38750,
  "created_at": "2026-03-25T14:00:00+00:00"
}

Each subsequent line is a single JSON object:

{"id":"doc-0000","title":"Result 0 for 'python async patterns'","score":1.0,"snippet":"This is the snippet for document 0. This is the snippet for document 0. This is the snippet for document 0. This is the snippet for document 0. This is the snippet for document 0. "}

Step 6 -- Use the explicit LROOffloader call

The decorator is convenient, but sometimes you need direct control. The LROOffloader class exposes the same logic through offload_if_needed().

Create a file called search_explicit.py:

# search_explicit.py
import json
from fastmcp_lro import LROOffloader, OffloadSection


def search(query: str) -> dict:
    """Simulate a search returning 1,000 results."""
    results = [
        {
            "id": f"doc-{i:04d}",
            "title": f"Result {i} for '{query}'",
            "score": round(1.0 - i * 0.001, 4),
            "snippet": f"This is the snippet for document {i}. " * 5,
        }
        for i in range(1000)
    ]
    return {
        "request_id": "req-abc-001",
        "query": query,
        "total_count": len(results),
        "results": results,
    }


RESULTS_SECTION = OffloadSection(
    key="results",
    filename_prefix="search-results",
    schema={
        "id": "string",
        "title": "string",
        "score": "number",
        "snippet": "string",
    },
    schema_description="One search hit per line. Line 1 is header metadata.",
)

offloader = LROOffloader(
    threshold=50000,
    output_dir="/tmp/my-lro-output",
    ttl_seconds=3600,
)

if __name__ == "__main__":
    data = search("python async patterns")

    # Check size before offloading
    size = offloader.estimate_size(data)
    print(f"Estimated size: {size:,} characters")
    print(f"Threshold: {offloader.threshold:,} characters")
    print(f"Will offload: {size > offloader.threshold}")
    print()

    result = offloader.offload_if_needed(
        data=data,
        sections=[RESULTS_SECTION],
        inline_keys=["query", "total_count"],
        session_id="req-abc-001",
    )

    print(json.dumps(result, indent=2))

Run it:

python search_explicit.py
Estimated size: 155,000 characters
Threshold: 50,000 characters
Will offload: True

{
  "offloaded": true,
  "query": "python async patterns",
  "total_count": 1000,
  ...
}

The explicit interface gives you access to estimate_size() for pre-flight checks and cleanup() for removing expired files:

# Clean up files older than 1 hour
deleted = offloader.cleanup(max_age_seconds=3600)
print(f"Cleaned up {len(deleted)} expired file(s)")

Step 7 -- A brief look at LROContext

The LROContext class wraps the same offload logic in a context manager. It is useful when the data is assembled incrementally or when you want offloading to happen automatically on block exit.

# search_context.py
import json
from fastmcp_lro import LROContext, OffloadSection


RESULTS_SECTION = OffloadSection(
    key="results",
    filename_prefix="search-results",
    schema={"id": "string", "title": "string", "score": "number"},
    schema_description="One search hit per line.",
)


def search(query: str) -> dict:
    results = [
        {
            "id": f"doc-{i:04d}",
            "title": f"Result {i} for '{query}'",
            "score": round(1.0 - i * 0.001, 4),
            "snippet": f"This is the snippet for document {i}. " * 5,
        }
        for i in range(1000)
    ]
    return {
        "query": query,
        "total_count": len(results),
        "results": results,
    }


if __name__ == "__main__":
    ctx = LROContext(
        sections=[RESULTS_SECTION],
        inline_keys=["query", "total_count"],
        session_id="ctx-demo-001",
        threshold=50000,
    )

    with ctx:
        data = search("python async patterns")
        ctx.set_data(data)

    # result is available after the block exits
    print(json.dumps(ctx.result, indent=2))

LROContext also works as an async context manager (async with) for use in async code paths.

What you learned

  1. OffloadSection declares which response key to offload, its schema, and how to summarize it.
  2. @lro_offload wraps a function so that large responses are automatically offloaded to JSONL. It works with both sync and async functions.
  3. LROOffloader.offload_if_needed() provides the same behavior through an explicit call, giving you access to estimate_size() and cleanup().
  4. LROContext wraps the logic in a context manager for incremental data assembly.
  5. The JSONL output has a header on line 1 (metadata) and data on lines 2+. Always skip the header with tail -n +2 when processing.
  6. The compact descriptor includes jq_recipes, schemas, and a summary so the consuming LLM can extract exactly what it needs without reading the entire file.

Next steps