Enhance documentation and CLI functionality

aaronkaplan · aaronkaplan · commit b2eae3b93b39 · 2026-04-09T14:12:28.000+02:00
- Added a new getting-started.md file for end-to-end walkthroughs of OpenAI Batch API and Ollama setup.
- load .env variables automatically.
- Improved output extraction logic
- Added tests for CLI and validation
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 `llm-batch-pipeline` is a generic LLM batch processing pipeline. It discovers and parses input files via a plugin system, renders OpenAI Batch API (or Ollama) requests, validates structured JSON outputs with a Pydantic schema, evaluates predictions against ground truth, and exports results to XLSX/JSON.
 
-See the **Admin Guide** for installation/deployment, and the **Developer Guide** for how to extend the pipeline with custom plugins, prompts, schemas, and evaluation.
+See the **Getting Started Guide** for a tested end-to-end walkthrough with OpenAI Batch API and a 3-way sharded Ollama setup ([docs/getting-started.md](docs/getting-started.md)). See the **User Guide** for installation and CLI reference ([docs/user-guide.md](docs/user-guide.md)). See the **Admin Guide** for installation/deployment, and the **Developer Guide** for how to extend the pipeline with custom plugins, prompts, schemas, and evaluation.
 
 ## Workflow
 ```mermaid
@@ -45,6 +45,25 @@ flowchart TD
 - Local LLM via Ollama: run an Ollama server (pull the model), then use `--backend ollama --base-url http://HOST:11434` (repeat `--base-url` for multi-server sharding).
 - OpenAI API compatible local server (if supported by your server): use `--backend openai` and configure the OpenAI SDK base URL (commonly via `OPENAI_BASE_URL`).
 
+## Getting Started
+- End-to-end walkthrough: [`docs/getting-started.md`](docs/getting-started.md)
+- The getting-started guide was tested against live OpenAI Batch and Ollama services.
+
+## Quick Test (offline)
+Run the unit test suite (no external LLM services):
+```bash
+uv sync --group dev
+uv run pytest -q
+```
+
+## Plugins
+List registered plugins:
+```bash
+uv run llm-batch-pipeline list
+```
+
+The built-in examples include `spam_detection` and `gdpr_detection`.
+
 ## Test / Benchmark
 - SpamAssassin corpus reference run: [`docs/benchmark-run.md`](docs/benchmark-run.md)
 
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -0,0 +1,289 @@
+# Getting Started
+
+This is the shortest end-to-end walkthrough for a new user who wants to see what `llm-batch-pipeline` does in practice.
+
+It covers two real workflows:
+
+- OpenAI Batch API with `gpt-4o-mini`
+- A 3-way sharded Ollama setup at `http://nanu:11435`, `http://nanu:11436`, and `http://nanu:11437`
+
+These instructions were tested against the live services on 2026-04-09.
+
+## What You Will Do
+
+You will:
+
+- create a batch job with the built-in `spam_detection` plugin
+- add two sample `.eml` files
+- render a batch JSONL file
+- submit it to a backend
+- validate the model output against a Pydantic schema
+- evaluate the predictions against ground truth
+
+## Prerequisites
+
+- Python 3.13+
+- `uv`
+- dependencies installed:
+
+```bash
+uv sync
+```
+
+- for OpenAI: a `.env` file in the repo root with `OPENAI_API_KEY=...`
+
+The CLI now auto-loads `.env` from the repository root.
+
+## Offline Sanity Check
+
+Before using any backend, verify the install:
+
+```bash
+uv run llm-batch-pipeline list
+uv sync --group dev
+uv run pytest -q
+```
+
+## OpenAI Batch Walkthrough
+
+### 1. Create a batch directory
+
+```bash
+uv run llm-batch-pipeline init getting_started_openai --plugin spam_detection --model gpt-4o-mini
+```
+
+This creates a directory like `batches/batch_001_getting_started_openai`.
+Use that path in the commands below as `<openai-batch-dir>`.
+
+### 2. Copy the built-in prompt and schema into the batch
+
+```bash
+cp src/llm_batch_pipeline/examples/spam_detection/prompt.txt <openai-batch-dir>/prompt.txt
+cp src/llm_batch_pipeline/examples/spam_detection/schema.py <openai-batch-dir>/schema.py
+```
+
+### 3. Add two sample emails
+
+```bash
+cat > <openai-batch-dir>/input/ham__team_sync.eml <<'EOF'
+From: alice@example.com
+To: bob@example.com
+Subject: Team sync tomorrow
+Date: Mon, 1 Jan 2024 10:00:00 +0000
+MIME-Version: 1.0
+Content-Type: text/plain; charset="utf-8"
+
+Hi Bob,
+
+Can we meet tomorrow at 3pm to review the release checklist and assign the last two action items?
+
+Thanks,
+Alice
+EOF
+
+cat > <openai-batch-dir>/input/spam__million_prize.eml <<'EOF'
+From: prizes@claim-now.biz
+To: victim@example.com
+Subject: URGENT!! Claim your 1000000 dollar prize now
+Date: Mon, 1 Jan 2024 11:00:00 +0000
+MIME-Version: 1.0
+Content-Type: text/plain; charset="utf-8"
+
+Congratulations!
+
+You have been selected to receive a 1000000 dollar cash prize. Click http://claim-prize-now.example.com immediately and send your bank details today to avoid losing your winnings.
+EOF
+```
+
+### 4. Add a category map for evaluation
+
+```bash
+cat > <openai-batch-dir>/evaluation/category-map.json <<'EOF'
+{
+  "ham": "ham",
+  "spam": "spam"
+}
+EOF
+```
+
+The `ham__...` and `spam__...` filename prefixes are how the evaluator infers ground truth from this file.
+
+### 5. Render the batch JSONL
+
+```bash
+uv run llm-batch-pipeline render --batch-dir <openai-batch-dir> --plugin spam_detection
+```
+
+This writes the request payload to `<openai-batch-dir>/job/batch-00001.jsonl`.
+
+### 6. Submit to OpenAI Batch API
+
+```bash
+uv run llm-batch-pipeline submit --batch-dir <openai-batch-dir> --backend openai
+```
+
+Notes:
+
+- this command waits for the batch to complete by default
+- in the live test for this guide, a 2-request batch took about 45 minutes to finish
+- batch metadata is saved to `<openai-batch-dir>/output/submission.json`
+
+If you do not want to keep the terminal open:
+
+```bash
+uv run llm-batch-pipeline submit --batch-dir <openai-batch-dir> --backend openai --no-wait
+uv run llm-batch-pipeline submit --batch-dir <openai-batch-dir> --backend openai --resume-batch-id <batch-id>
+```
+
+### 7. Validate the output
+
+```bash
+uv run llm-batch-pipeline validate --batch-dir <openai-batch-dir>
+```
+
+This reads `<openai-batch-dir>/output/output.jsonl` and writes validated rows to `<openai-batch-dir>/results/validated.json`.
+
+### 8. Evaluate the predictions
+
+```bash
+uv run llm-batch-pipeline evaluate \
+  --batch-dir <openai-batch-dir> \
+  --label-field classification \
+  --confidence-field confidence \
+  --positive-class spam
+```
+
+This prints accuracy, macro F1, per-class metrics, and the confusion matrix to the terminal.
+
+In the tested run, the OpenAI batch classified both sample emails correctly.
+
+## Ollama Walkthrough
+
+### 1. Create a batch directory
+
+```bash
+uv run llm-batch-pipeline init getting_started_ollama --plugin spam_detection --model gemma4:latest
+```
+
+This creates a directory like `batches/batch_002_getting_started_ollama`.
+Use that path in the commands below as `<ollama-batch-dir>`.
+
+### 2. Copy the built-in prompt and schema into the batch
+
+```bash
+cp src/llm_batch_pipeline/examples/spam_detection/prompt.txt <ollama-batch-dir>/prompt.txt
+cp src/llm_batch_pipeline/examples/spam_detection/schema.py <ollama-batch-dir>/schema.py
+```
+
+### 3. Add the same sample inputs and evaluation map
+
+```bash
+cat > <ollama-batch-dir>/input/ham__team_sync.eml <<'EOF'
+From: alice@example.com
+To: bob@example.com
+Subject: Team sync tomorrow
+Date: Mon, 1 Jan 2024 10:00:00 +0000
+MIME-Version: 1.0
+Content-Type: text/plain; charset="utf-8"
+
+Hi Bob,
+
+Can we meet tomorrow at 3pm to review the release checklist and assign the last two action items?
+
+Thanks,
+Alice
+EOF
+
+cat > <ollama-batch-dir>/input/spam__million_prize.eml <<'EOF'
+From: prizes@claim-now.biz
+To: victim@example.com
+Subject: URGENT!! Claim your 1000000 dollar prize now
+Date: Mon, 1 Jan 2024 11:00:00 +0000
+MIME-Version: 1.0
+Content-Type: text/plain; charset="utf-8"
+
+Congratulations!
+
+You have been selected to receive a 1000000 dollar cash prize. Click http://claim-prize-now.example.com immediately and send your bank details today to avoid losing your winnings.
+EOF
+
+cat > <ollama-batch-dir>/evaluation/category-map.json <<'EOF'
+{
+  "ham": "ham",
+  "spam": "spam"
+}
+EOF
+```
+
+### 4. Render the batch JSONL
+
+```bash
+uv run llm-batch-pipeline render --batch-dir <ollama-batch-dir> --plugin spam_detection
+```
+
+### 5. Submit to the 3-way sharded Ollama cluster
+
+```bash
+uv run llm-batch-pipeline submit \
+  --batch-dir <ollama-batch-dir> \
+  --backend ollama \
+  --model gemma4:latest \
+  --base-url http://nanu:11435 \
+  --base-url http://nanu:11436 \
+  --base-url http://nanu:11437 \
+  --num-shards 3 \
+  --num-parallel-jobs 1
+```
+
+Notes:
+
+- these exact three URLs were verified for this guide
+- `http://11436` is not a valid endpoint; use `http://nanu:11436`
+- in the live test for this guide, the full 2-request Ollama submission finished in about 6 seconds
+
+### 6. Validate the output
+
+```bash
+uv run llm-batch-pipeline validate --batch-dir <ollama-batch-dir>
+```
+
+### 7. Evaluate the predictions
+
+```bash
+uv run llm-batch-pipeline evaluate \
+  --batch-dir <ollama-batch-dir> \
+  --label-field classification \
+  --confidence-field confidence \
+  --positive-class spam
+```
+
+In the tested run, the Ollama batch also classified both sample emails correctly.
+
+## Output You Should Expect
+
+After `render`:
+
+- `<batch-dir>/job/batch-00001.jsonl`
+
+After `submit`:
+
+- `<batch-dir>/output/output.jsonl`
+- `<batch-dir>/output/summary.json`
+
+After `validate`:
+
+- `<batch-dir>/results/validated.json`
+
+After `evaluate`:
+
+- metrics printed to stdout
+
+## When To Use `run` Instead
+
+If you already trust your prompt, schema, and backend settings, you can collapse the whole pipeline into one command:
+
+```bash
+uv run llm-batch-pipeline run --batch-dir <batch-dir> --plugin spam_detection --auto-approve ...
+```
+
+For a first pass, the staged workflow above is easier to debug because you can inspect the rendered JSONL, raw model output, validated JSON, and evaluation step separately.
diff --git a/src/llm_batch_pipeline/cli.py b/src/llm_batch_pipeline/cli.py
@@ -519,9 +519,13 @@ def _cmd_validate(args: argparse.Namespace) -> int:
             # Auto-discover from output dir
             output_dir = config.output_dir
             if output_dir.is_dir():
-                output_files = sorted(output_dir.glob("*.jsonl"))
-                if output_files:
-                    ctx.artifacts["output_files"] = [str(f) for f in output_files]
+                preferred = output_dir / "output.jsonl"
+                if preferred.is_file():
+                    ctx.artifacts["output_files"] = [str(preferred)]
+                else:
+                    output_files = sorted(output_dir.glob("*.jsonl"))
+                    if output_files:
+                        ctx.artifacts["output_files"] = [str(f) for f in output_files]
 
         if not ctx.artifacts.get("output_files"):
             console.print("[red]No output JSONL files found. Run 'submit' first or specify --output-jsonl.[/red]")
@@ -655,6 +659,13 @@ def _cmd_list(_args: argparse.Namespace) -> int:
 
 def main(argv: Sequence[str] | None = None) -> int:
     """Entry point for the llm-batch-pipeline CLI."""
+    try:
+        from dotenv import load_dotenv  # noqa: PLC0415
+
+        load_dotenv()
+    except ImportError:
+        pass
+
     parser = build_parser()
     args = parser.parse_args(argv if argv is not None else sys.argv[1:])
 
diff --git a/src/llm_batch_pipeline/validation.py b/src/llm_batch_pipeline/validation.py
@@ -163,13 +163,18 @@ def validate_batch_output(
 
 def _extract_output_text(record: dict[str, Any]) -> str | None:
     """Extract the output text from an OpenAI-compatible response record."""
+    candidate: str | None = None
     try:
         body = record["response"]["body"]
         for output_item in body.get("output", []):
             if output_item.get("type") == "message":
                 for content_item in output_item.get("content", []):
                     if content_item.get("type") == "output_text":
-                        return content_item.get("text")
+                        text = content_item.get("text")
+                        if text:
+                            return text
+                        if text == "":
+                            candidate = text
     except (KeyError, TypeError):
         pass
-    return None
+    return candidate
diff --git a/tests/test_cli.py b/tests/test_cli.py
diff --git a/tests/test_validation.py b/tests/test_validation.py