Skip to content

Commit b2eae3b

Browse files
committed
Enhance documentation and CLI functionality
- Added a new getting-started.md file for end-to-end walkthroughs of OpenAI Batch API and Ollama setup. - load .env variables automatically. - Improved output extraction logic - Added tests for CLI and validation
1 parent e74c3d6 commit b2eae3b

6 files changed

Lines changed: 366 additions & 6 deletions

File tree

README.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
`llm-batch-pipeline` is a generic LLM batch processing pipeline. It discovers and parses input files via a plugin system, renders OpenAI Batch API (or Ollama) requests, validates structured JSON outputs with a Pydantic schema, evaluates predictions against ground truth, and exports results to XLSX/JSON.
44

5-
See the **Admin Guide** for installation/deployment, and the **Developer Guide** for how to extend the pipeline with custom plugins, prompts, schemas, and evaluation.
5+
See the **Getting Started Guide** for a tested end-to-end walkthrough with OpenAI Batch API and a 3-way sharded Ollama setup ([docs/getting-started.md](docs/getting-started.md)). See the **User Guide** for installation and CLI reference ([docs/user-guide.md](docs/user-guide.md)). See the **Admin Guide** for installation/deployment, and the **Developer Guide** for how to extend the pipeline with custom plugins, prompts, schemas, and evaluation.
66

77
## Workflow
88
```mermaid
@@ -45,6 +45,25 @@ flowchart TD
4545
- Local LLM via Ollama: run an Ollama server (pull the model), then use `--backend ollama --base-url http://HOST:11434` (repeat `--base-url` for multi-server sharding).
4646
- OpenAI API compatible local server (if supported by your server): use `--backend openai` and configure the OpenAI SDK base URL (commonly via `OPENAI_BASE_URL`).
4747

48+
## Getting Started
49+
- End-to-end walkthrough: [`docs/getting-started.md`](docs/getting-started.md)
50+
- The getting-started guide was tested against live OpenAI Batch and Ollama services.
51+
52+
## Quick Test (offline)
53+
Run the unit test suite (no external LLM services):
54+
```bash
55+
uv sync --group dev
56+
uv run pytest -q
57+
```
58+
59+
## Plugins
60+
List registered plugins:
61+
```bash
62+
uv run llm-batch-pipeline list
63+
```
64+
65+
The built-in examples include `spam_detection` and `gdpr_detection`.
66+
4867
## Test / Benchmark
4968
- SpamAssassin corpus reference run: [`docs/benchmark-run.md`](docs/benchmark-run.md)
5069

docs/getting-started.md

Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
# Getting Started
2+
3+
This is the shortest end-to-end walkthrough for a new user who wants to see what `llm-batch-pipeline` does in practice.
4+
5+
It covers two real workflows:
6+
7+
- OpenAI Batch API with `gpt-4o-mini`
8+
- A 3-way sharded Ollama setup at `http://nanu:11435`, `http://nanu:11436`, and `http://nanu:11437`
9+
10+
These instructions were tested against the live services on 2026-04-09.
11+
12+
## What You Will Do
13+
14+
You will:
15+
16+
- create a batch job with the built-in `spam_detection` plugin
17+
- add two sample `.eml` files
18+
- render a batch JSONL file
19+
- submit it to a backend
20+
- validate the model output against a Pydantic schema
21+
- evaluate the predictions against ground truth
22+
23+
## Prerequisites
24+
25+
- Python 3.13+
26+
- `uv`
27+
- dependencies installed:
28+
29+
```bash
30+
uv sync
31+
```
32+
33+
- for OpenAI: a `.env` file in the repo root with `OPENAI_API_KEY=...`
34+
35+
The CLI now auto-loads `.env` from the repository root.
36+
37+
## Offline Sanity Check
38+
39+
Before using any backend, verify the install:
40+
41+
```bash
42+
uv run llm-batch-pipeline list
43+
uv sync --group dev
44+
uv run pytest -q
45+
```
46+
47+
## OpenAI Batch Walkthrough
48+
49+
### 1. Create a batch directory
50+
51+
```bash
52+
uv run llm-batch-pipeline init getting_started_openai --plugin spam_detection --model gpt-4o-mini
53+
```
54+
55+
This creates a directory like `batches/batch_001_getting_started_openai`.
56+
Use that path in the commands below as `<openai-batch-dir>`.
57+
58+
### 2. Copy the built-in prompt and schema into the batch
59+
60+
```bash
61+
cp src/llm_batch_pipeline/examples/spam_detection/prompt.txt <openai-batch-dir>/prompt.txt
62+
cp src/llm_batch_pipeline/examples/spam_detection/schema.py <openai-batch-dir>/schema.py
63+
```
64+
65+
### 3. Add two sample emails
66+
67+
```bash
68+
cat > <openai-batch-dir>/input/ham__team_sync.eml <<'EOF'
69+
From: alice@example.com
70+
To: bob@example.com
71+
Subject: Team sync tomorrow
72+
Date: Mon, 1 Jan 2024 10:00:00 +0000
73+
MIME-Version: 1.0
74+
Content-Type: text/plain; charset="utf-8"
75+
76+
Hi Bob,
77+
78+
Can we meet tomorrow at 3pm to review the release checklist and assign the last two action items?
79+
80+
Thanks,
81+
Alice
82+
EOF
83+
84+
cat > <openai-batch-dir>/input/spam__million_prize.eml <<'EOF'
85+
From: prizes@claim-now.biz
86+
To: victim@example.com
87+
Subject: URGENT!! Claim your 1000000 dollar prize now
88+
Date: Mon, 1 Jan 2024 11:00:00 +0000
89+
MIME-Version: 1.0
90+
Content-Type: text/plain; charset="utf-8"
91+
92+
Congratulations!
93+
94+
You have been selected to receive a 1000000 dollar cash prize. Click http://claim-prize-now.example.com immediately and send your bank details today to avoid losing your winnings.
95+
EOF
96+
```
97+
98+
### 4. Add a category map for evaluation
99+
100+
```bash
101+
cat > <openai-batch-dir>/evaluation/category-map.json <<'EOF'
102+
{
103+
"ham": "ham",
104+
"spam": "spam"
105+
}
106+
EOF
107+
```
108+
109+
The `ham__...` and `spam__...` filename prefixes are how the evaluator infers ground truth from this file.
110+
111+
### 5. Render the batch JSONL
112+
113+
```bash
114+
uv run llm-batch-pipeline render --batch-dir <openai-batch-dir> --plugin spam_detection
115+
```
116+
117+
This writes the request payload to `<openai-batch-dir>/job/batch-00001.jsonl`.
118+
119+
### 6. Submit to OpenAI Batch API
120+
121+
```bash
122+
uv run llm-batch-pipeline submit --batch-dir <openai-batch-dir> --backend openai
123+
```
124+
125+
Notes:
126+
127+
- this command waits for the batch to complete by default
128+
- in the live test for this guide, a 2-request batch took about 45 minutes to finish
129+
- batch metadata is saved to `<openai-batch-dir>/output/submission.json`
130+
131+
If you do not want to keep the terminal open:
132+
133+
```bash
134+
uv run llm-batch-pipeline submit --batch-dir <openai-batch-dir> --backend openai --no-wait
135+
uv run llm-batch-pipeline submit --batch-dir <openai-batch-dir> --backend openai --resume-batch-id <batch-id>
136+
```
137+
138+
### 7. Validate the output
139+
140+
```bash
141+
uv run llm-batch-pipeline validate --batch-dir <openai-batch-dir>
142+
```
143+
144+
This reads `<openai-batch-dir>/output/output.jsonl` and writes validated rows to `<openai-batch-dir>/results/validated.json`.
145+
146+
### 8. Evaluate the predictions
147+
148+
```bash
149+
uv run llm-batch-pipeline evaluate \
150+
--batch-dir <openai-batch-dir> \
151+
--label-field classification \
152+
--confidence-field confidence \
153+
--positive-class spam
154+
```
155+
156+
This prints accuracy, macro F1, per-class metrics, and the confusion matrix to the terminal.
157+
158+
In the tested run, the OpenAI batch classified both sample emails correctly.
159+
160+
## Ollama Walkthrough
161+
162+
### 1. Create a batch directory
163+
164+
```bash
165+
uv run llm-batch-pipeline init getting_started_ollama --plugin spam_detection --model gemma4:latest
166+
```
167+
168+
This creates a directory like `batches/batch_002_getting_started_ollama`.
169+
Use that path in the commands below as `<ollama-batch-dir>`.
170+
171+
### 2. Copy the built-in prompt and schema into the batch
172+
173+
```bash
174+
cp src/llm_batch_pipeline/examples/spam_detection/prompt.txt <ollama-batch-dir>/prompt.txt
175+
cp src/llm_batch_pipeline/examples/spam_detection/schema.py <ollama-batch-dir>/schema.py
176+
```
177+
178+
### 3. Add the same sample inputs and evaluation map
179+
180+
```bash
181+
cat > <ollama-batch-dir>/input/ham__team_sync.eml <<'EOF'
182+
From: alice@example.com
183+
To: bob@example.com
184+
Subject: Team sync tomorrow
185+
Date: Mon, 1 Jan 2024 10:00:00 +0000
186+
MIME-Version: 1.0
187+
Content-Type: text/plain; charset="utf-8"
188+
189+
Hi Bob,
190+
191+
Can we meet tomorrow at 3pm to review the release checklist and assign the last two action items?
192+
193+
Thanks,
194+
Alice
195+
EOF
196+
197+
cat > <ollama-batch-dir>/input/spam__million_prize.eml <<'EOF'
198+
From: prizes@claim-now.biz
199+
To: victim@example.com
200+
Subject: URGENT!! Claim your 1000000 dollar prize now
201+
Date: Mon, 1 Jan 2024 11:00:00 +0000
202+
MIME-Version: 1.0
203+
Content-Type: text/plain; charset="utf-8"
204+
205+
Congratulations!
206+
207+
You have been selected to receive a 1000000 dollar cash prize. Click http://claim-prize-now.example.com immediately and send your bank details today to avoid losing your winnings.
208+
EOF
209+
210+
cat > <ollama-batch-dir>/evaluation/category-map.json <<'EOF'
211+
{
212+
"ham": "ham",
213+
"spam": "spam"
214+
}
215+
EOF
216+
```
217+
218+
### 4. Render the batch JSONL
219+
220+
```bash
221+
uv run llm-batch-pipeline render --batch-dir <ollama-batch-dir> --plugin spam_detection
222+
```
223+
224+
### 5. Submit to the 3-way sharded Ollama cluster
225+
226+
```bash
227+
uv run llm-batch-pipeline submit \
228+
--batch-dir <ollama-batch-dir> \
229+
--backend ollama \
230+
--model gemma4:latest \
231+
--base-url http://nanu:11435 \
232+
--base-url http://nanu:11436 \
233+
--base-url http://nanu:11437 \
234+
--num-shards 3 \
235+
--num-parallel-jobs 1
236+
```
237+
238+
Notes:
239+
240+
- these exact three URLs were verified for this guide
241+
- `http://11436` is not a valid endpoint; use `http://nanu:11436`
242+
- in the live test for this guide, the full 2-request Ollama submission finished in about 6 seconds
243+
244+
### 6. Validate the output
245+
246+
```bash
247+
uv run llm-batch-pipeline validate --batch-dir <ollama-batch-dir>
248+
```
249+
250+
### 7. Evaluate the predictions
251+
252+
```bash
253+
uv run llm-batch-pipeline evaluate \
254+
--batch-dir <ollama-batch-dir> \
255+
--label-field classification \
256+
--confidence-field confidence \
257+
--positive-class spam
258+
```
259+
260+
In the tested run, the Ollama batch also classified both sample emails correctly.
261+
262+
## Output You Should Expect
263+
264+
After `render`:
265+
266+
- `<batch-dir>/job/batch-00001.jsonl`
267+
268+
After `submit`:
269+
270+
- `<batch-dir>/output/output.jsonl`
271+
- `<batch-dir>/output/summary.json`
272+
273+
After `validate`:
274+
275+
- `<batch-dir>/results/validated.json`
276+
277+
After `evaluate`:
278+
279+
- metrics printed to stdout
280+
281+
## When To Use `run` Instead
282+
283+
If you already trust your prompt, schema, and backend settings, you can collapse the whole pipeline into one command:
284+
285+
```bash
286+
uv run llm-batch-pipeline run --batch-dir <batch-dir> --plugin spam_detection --auto-approve ...
287+
```
288+
289+
For a first pass, the staged workflow above is easier to debug because you can inspect the rendered JSONL, raw model output, validated JSON, and evaluation step separately.

src/llm_batch_pipeline/cli.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -519,9 +519,13 @@ def _cmd_validate(args: argparse.Namespace) -> int:
519519
# Auto-discover from output dir
520520
output_dir = config.output_dir
521521
if output_dir.is_dir():
522-
output_files = sorted(output_dir.glob("*.jsonl"))
523-
if output_files:
524-
ctx.artifacts["output_files"] = [str(f) for f in output_files]
522+
preferred = output_dir / "output.jsonl"
523+
if preferred.is_file():
524+
ctx.artifacts["output_files"] = [str(preferred)]
525+
else:
526+
output_files = sorted(output_dir.glob("*.jsonl"))
527+
if output_files:
528+
ctx.artifacts["output_files"] = [str(f) for f in output_files]
525529

526530
if not ctx.artifacts.get("output_files"):
527531
console.print("[red]No output JSONL files found. Run 'submit' first or specify --output-jsonl.[/red]")
@@ -655,6 +659,13 @@ def _cmd_list(_args: argparse.Namespace) -> int:
655659

656660
def main(argv: Sequence[str] | None = None) -> int:
657661
"""Entry point for the llm-batch-pipeline CLI."""
662+
try:
663+
from dotenv import load_dotenv # noqa: PLC0415
664+
665+
load_dotenv()
666+
except ImportError:
667+
pass
668+
658669
parser = build_parser()
659670
args = parser.parse_args(argv if argv is not None else sys.argv[1:])
660671

src/llm_batch_pipeline/validation.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -163,13 +163,18 @@ def validate_batch_output(
163163

164164
def _extract_output_text(record: dict[str, Any]) -> str | None:
165165
"""Extract the output text from an OpenAI-compatible response record."""
166+
candidate: str | None = None
166167
try:
167168
body = record["response"]["body"]
168169
for output_item in body.get("output", []):
169170
if output_item.get("type") == "message":
170171
for content_item in output_item.get("content", []):
171172
if content_item.get("type") == "output_text":
172-
return content_item.get("text")
173+
text = content_item.get("text")
174+
if text:
175+
return text
176+
if text == "":
177+
candidate = text
173178
except (KeyError, TypeError):
174179
pass
175-
return None
180+
return candidate

0 commit comments

Comments
 (0)