Skip to content

Commit bbe0873

Browse files
committed
preparing for public release
1 parent 9b312b2 commit bbe0873

31 files changed

Lines changed: 1234 additions & 2444 deletions

.github/workflows/ci.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,8 @@ jobs:
9494
- uses: actions/setup-python@v5
9595
with:
9696
python-version: "3.11"
97-
- name: Install package
98-
run: pip install -e ".[dev]"
97+
- name: Install package (dev + env for policy/schema code that may touch env)
98+
run: pip install -e ".[dev,env]"
9999
- name: Validate policy
100100
run: labtrust validate-policy
101101
- name: Validate policy (partner overlay hsl_like)
@@ -118,8 +118,8 @@ jobs:
118118
- uses: actions/setup-python@v5
119119
with:
120120
python-version: "3.11"
121-
- name: Install package
122-
run: pip install -e ".[dev]"
121+
- name: Install package (dev + env so export/lab_design and pytest collection succeed)
122+
run: pip install -e ".[dev,env]"
123123
- name: Verify ui_fixtures evidence bundle
124124
run: labtrust verify-bundle --bundle tests/fixtures/ui_fixtures/evidence_bundle/EvidenceBundle.v0.1
125125
- name: Export bundle from ui_fixtures (for artifact inspection; tests build bundle in memory)
@@ -134,8 +134,8 @@ jobs:
134134
- uses: actions/setup-python@v5
135135
with:
136136
python-version: "3.11"
137-
- name: Install package
138-
run: pip install -e ".[dev]"
137+
- name: Install package (dev + env for export-risk-register)
138+
run: pip install -e ".[dev,env]"
139139
# Plan completeness checked on every PR so required_bench_plan is runnable.
140140
- name: Required bench plan completeness
141141
run: python scripts/required_bench_plan_runs.py > /dev/null
@@ -309,8 +309,8 @@ jobs:
309309
- uses: actions/setup-python@v5
310310
with:
311311
python-version: "3.11"
312-
- name: Install package
313-
run: pip install -e ".[dev]"
312+
- name: Install package (dev + env for CLI imports)
313+
run: pip install -e ".[dev,env]"
314314
- name: Create minimal artifact and run transparency-log
315315
run: |
316316
mkdir -p artifact/_repr/throughput_sla
@@ -328,8 +328,8 @@ jobs:
328328
- uses: actions/setup-python@v5
329329
with:
330330
python-version: "3.11"
331-
- name: Install package with docs extra
332-
run: pip install -e ".[docs]"
331+
- name: Install package (docs + env so mkdocstrings can import labtrust_gym)
332+
run: pip install -e ".[docs,env]"
333333
- name: Build MkDocs
334334
run: mkdocs build --strict
335335

.github/workflows/docs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ jobs:
2525
with:
2626
python-version: "3.11"
2727

28-
- name: Install package and docs extra
29-
run: pip install -e ".[docs]"
28+
- name: Install package (docs + env so mkdocstrings can import labtrust_gym)
29+
run: pip install -e ".[docs,env]"
3030

3131
- name: Build MkDocs
3232
run: mkdocs build --strict

.github/workflows/release-fixture-verify.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ jobs:
1717
- uses: actions/setup-python@v5
1818
with:
1919
python-version: "3.11"
20-
- name: Install package
21-
run: pip install -e ".[dev]"
20+
- name: Install package (dev + env for verify_release/export code paths)
21+
run: pip install -e ".[dev,env]"
2222
- name: Normalize release fixture and regenerate manifests
2323
run: python scripts/normalize_release_fixture_manifests.py
2424
- name: Run release fixture verify test

.github/workflows/risk-coverage-pr.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,8 @@ jobs:
4545
- uses: actions/setup-python@v5
4646
with:
4747
python-version: "3.11"
48-
- name: Install package
49-
run: pip install -e ".[dev]"
48+
- name: Install package (dev + env for run-benchmark and export-risk-register)
49+
run: pip install -e ".[dev,env]"
5050
- name: Verify fixture evidence (when receipts or SECURITY present)
5151
run: |
5252
dirs=$(python scripts/risk_coverage_fixture_dirs.py --dirs-only)

README.md

Lines changed: 23 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -7,37 +7,30 @@
77
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
88
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-green.svg)](https://www.python.org/downloads/)
99

10-
**A multi-agent environment (PettingZoo/Gym) for hospital lab automation, with a reference trust skeleton.** The first instance models a pathology lab—specifically a blood sciences lane ([Glossary](docs/reference/glossary.md#lab-terminology-hospital-lab-pathology-lab-blood-sciences-lab)).
11-
12-
**What it provides:** RBAC, signed actions, append-only audit log, invariants, and anomaly throttles—all driven by versioned policy and golden scenarios.
13-
14-
**Trust skeleton (at a glance)**
15-
16-
```mermaid
17-
flowchart LR
18-
Policy["policy/ (YAML)"]
19-
Policy --> RBAC["RBAC"]
20-
Policy --> Sig["Signed\nactions"]
21-
Policy --> Audit["Audit log\n(hash-chained)"]
22-
Policy --> Inv["Invariants"]
23-
Policy --> Codes["Reason\ncodes"]
24-
```
10+
**A multi-agent environment (PettingZoo/Gym) for hospital lab automation, with a reference trust skeleton.**
2511

2612
---
2713

2814
## Contents
2915

30-
- [North star](#north-star)
31-
- [Who is this for?](#who-is-this-for--i-want-to)
32-
- [Installation](#installation-pip)
33-
- [Pipelines](#pipelines)
34-
- [Quick eval](#quick-eval)
35-
- [CLI](#cli)
36-
- [Repository structure](#repository-structure)
37-
- [Golden runner](#golden-runner)
38-
- [Reproducibility and citation](#reproducibility-and-citation)
39-
- [Release and contract freeze](#release-and-contract-freeze)
40-
- [Architecture diagrams](docs/architecture/diagrams.md) (full pipeline and lab topology)
16+
- [LabTrust-Gym](#labtrust-gym)
17+
- [Contents](#contents)
18+
- [North star](#north-star)
19+
- [Who is this for? / I want to...](#who-is-this-for--i-want-to)
20+
- [Installation (pip)](#installation-pip)
21+
- [Pipelines](#pipelines)
22+
- [Quick eval](#quick-eval)
23+
- [CLI](#cli)
24+
- [Policy and validation](#policy-and-validation)
25+
- [Benchmarking and evaluation](#benchmarking-and-evaluation)
26+
- [Export and verification](#export-and-verification)
27+
- [Security and safety](#security-and-safety)
28+
- [Risk register](#risk-register)
29+
- [Coordination and studies](#coordination-and-studies)
30+
- [Release and reproducibility](#release-and-reproducibility)
31+
- [Repository structure](#repository-structure)
32+
- [Reproducibility and citation](#reproducibility-and-citation)
33+
- [License](#license)
4134

4235
---
4336

@@ -67,7 +60,7 @@ System and threat model: [Systems and threat model](docs/architecture/systems_an
6760
| I want to... | First step |
6861
|--------------|------------|
6962
| Run benchmarks only | `pip install labtrust-gym[env,plots]` then `labtrust quick-eval` |
70-
| Add my coordination method (or task) | [Extension development](docs/agents/extension_development.md) + entry_points; see [examples/extension_example](examples/extension_example/) |
63+
| Add my coordination method (or task) | [Extension development](docs/agents/extension_development.md) + entry_points; see [examples/extension_example](https://github.com/fraware/LabTrust-Gym/tree/main/examples/extension_example) |
7164
| Fork and customize policy | [Forker guide](docs/getting-started/forkers.md) and `labtrust forker-quickstart` |
7265
| Use as a library without forking | [Extension development](docs/agents/extension_development.md) + `--profile` + `extension_packages` in a lab profile |
7366
| Run the full security suite | `labtrust run-security-suite`; needs `.[env]`; use `--skip-system-level` when env is not installed |
@@ -98,9 +91,6 @@ labtrust validate-policy
9891
pytest -q
9992
```
10093

101-
- **Live tests:** Run when `OPENAI_API_KEY` and `LABTRUST_RUN_LLM_LIVE=1` or `LABTRUST_RUN_LLM_ATTACKER=1` are set. Use **`pytest -m 'not slow'`**; avoid `-m 'not slow and not live'` if you want live tests to run.
102-
- **Policy path:** Run from repo root so `policy/` is found; otherwise **PolicyPathError**. Override with **LABTRUST_POLICY_DIR**. See [Installation](docs/getting-started/installation.md) and [Troubleshooting](docs/getting-started/troubleshooting.md#policy-directory-not-found-policypatherror).
103-
10494
**Full stack** (benchmarks, studies, plots)
10595

10696
```bash
@@ -121,7 +111,7 @@ labtrust reproduce --profile minimal
121111
| Extra | Purpose |
122112
|-------|---------|
123113
| `[env]` | PettingZoo/Gymnasium (benchmarks and full security suite including coord_pack_ref) |
124-
| `[plots]` | Matplotlib |
114+
| `[plots]` | Matplotlib and Pillow (study figures, data tables) |
125115
| `[llm_openai]` | OpenAI live backend (openai_live) |
126116
| `[llm_anthropic]` | Anthropic live backend (anthropic_live) |
127117
| `[marl]` | Stable-Baselines3 (PPO train/eval) |
@@ -139,9 +129,9 @@ Benchmarks run in one of three modes: **deterministic** | **llm_offline** | **ll
139129
```mermaid
140130
flowchart LR
141131
Run["Run benchmark"]
142-
Run --> D["deterministic\n(default)"]
132+
Run --> D["deterministic (default)"]
143133
Run --> O["llm_offline"]
144-
Run --> L["llm_live\n+ --allow-network"]
134+
Run --> L["llm_live + --allow-network"]
145135
D --> NoNet["No network"]
146136
O --> NoNet
147137
L --> Net["Network / API"]
@@ -155,11 +145,6 @@ flowchart LR
155145

156146
Set mode with `--pipeline-mode`; for live LLM add `--allow-network` or `LABTRUST_ALLOW_NETWORK=1`.
157147

158-
> **Why you saw no OpenAI calls**
159-
> Runs are **offline by default**. `quick-eval`, `run-benchmark`, `reproduce`, and `package-release` use `pipeline_mode=deterministic` unless you pass `--pipeline-mode llm_live` and `--allow-network`. The CLI loads `.env` (or `LABTRUST_DOTENV_PATH`); keys there are used for live LLM.
160-
> **Live LLM:** `--pipeline-mode llm_live --allow-network --llm-backend openai_live` (or `anthropic_live`, `ollama_live`). The CLI prints **WILL MAKE NETWORK CALLS / MAY INCUR COST**.
161-
> Every run records `pipeline_mode`, `llm_backend_id`, `llm_model_id`, and `allow_network` in **results.json** and UI **index.json**; result files also record **non_deterministic** for audit.
162-
163148
---
164149

165150
## Quick eval
@@ -270,12 +255,6 @@ Put CLI outputs in `labtrust_runs/` or `--out`. Exit codes, minimal smoke args,
270255

271256
---
272257

273-
## Golden runner
274-
275-
The golden runner (`labtrust_gym.runner`) runs scenarios from `policy/golden/golden_scenarios.v0.1.yaml` against an environment adapter implementing `LabTrustEnvAdapter` (reset, step, query). Step results must conform to the runner output contract (status, emits, violations, hashchain, etc.); unknown emits fail the suite. Full suite: `LABTRUST_RUN_GOLDEN=1 pytest tests/test_golden_suite.py`.
276-
277-
---
278-
279258
## Reproducibility and citation
280259

281260
Cite using [CITATION.cff](CITATION.cff).
@@ -291,16 +270,6 @@ Cite using [CITATION.cff](CITATION.cff).
291270

292271
---
293272

294-
## Release and contract freeze
295-
296-
- **Release** — E2E artifacts chain before tagging. [Trust verification](docs/risk-and-security/trust_verification.md), [CONTRIBUTING](CONTRIBUTING.md). **`make verify`** (full battery); **`make paper OUT=<dir>`** (paper artifact); **`labtrust audit-selfcheck --out <dir>`** (Phase A + doctor checks). Paper claims regression: [PAPER_CLAIMS](docs/benchmarks/PAPER_CLAIMS.md).
297-
- **Version**`labtrust --version` (version + git SHA). Tag from clean main after checklist.
298-
- **Contract freeze**[Frozen contracts](docs/contracts/frozen_contracts.md): runner output, queue, invariant registry, enforcement, receipt, evidence bundle, FHIR, results v0.2; v0.3 extensible only.
299-
- **Quickstart (paper)**`bash scripts/quickstart_paper_v0_1.sh` or `scripts/quickstart_paper_v0.1.ps1`: install, validate-policy, quick-eval, package-release paper_v0.1, verify-release. Full release: export-risk-register into release dir, build-release-manifest, verify-release --strict-fingerprints. [Trust verification](docs/risk-and-security/trust_verification.md).
300-
- **UI**[tests/fixtures/ui_fixtures/](tests/fixtures/ui_fixtures/). [UI data contract](docs/contracts/ui_data_contract.md).
301-
302-
---
303-
304273
## License
305274

306275
Apache-2.0.

docs/benchmarks/hospital_lab_full_pipeline_results_report.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Hospital lab full pipeline – results report
22

3-
This document summarizes the results from full-pipeline runs for the pathology lab (blood sciences) design: what was run, what succeeded, and how to interpret the artifacts.
3+
This document summarizes **example** results from full-pipeline runs for the pathology lab (blood sciences) design: what was run, what succeeded, and how to interpret the artifacts. The run directories cited (e.g. `runs/hospital_lab_full_pipeline_smoke`) are from representative runs; regenerate them with [Hospital lab full pipeline](hospital_lab_full_pipeline.md) if needed.
44

55
---
66

docs/benchmarks/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ Tasks, benchmark cards, official pack, studies, and reproduction.
2121
| Document | Description |
2222
|----------|-------------|
2323
| [Official benchmark pack](official_benchmark_pack.md) | v0.1/v0.2 and run commands. |
24+
| [Hospital lab full pipeline](hospital_lab_full_pipeline.md) | Full-pipeline script and orchestration. |
25+
| [Hospital lab full pipeline results](hospital_lab_full_pipeline_results_report.md) | Example results report (regenerate runs as needed). |
2426
| [Studies and plots](studies.md) | Study runner, make-plots. |
2527
| [Coordination studies](../coordination/coordination_studies.md) | Coordination study runner and Pareto. |
2628
| [LLM Coordination Protocol](llm_coordination_protocol.md) | LLM coordination protocol. |

docs/benchmarks/paper/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Paper figure and table provenance (v0.1.0)
1+
# Paper figure and table provenance (paper_v0.1 profile)
22

33
Figure/table to path, command, and seeds. Aligned with [PAPER_CLAIMS](../PAPER_CLAIMS.md). Update when the paper is written.
44

@@ -21,9 +21,9 @@ A single tarball (e.g. from GitHub Release or Zenodo) should contain or point to
2121

2222
- Wheel/sdist: `pip install labtrust-gym[env,plots]`
2323
- Policy: bundled in wheel or `policy/` in repo
24-
- This provenance map: `docs/paper/README.md`
25-
- CONTRACTS: `docs/frozen_contracts.md`
26-
- PAPER_CLAIMS: `docs/PAPER_CLAIMS.md`
24+
- This provenance map: `docs/benchmarks/paper/README.md`
25+
- CONTRACTS: `docs/contracts/frozen_contracts.md`
26+
- PAPER_CLAIMS: `docs/benchmarks/PAPER_CLAIMS.md`
2727

2828
Verification: run quick-eval, package-release paper_v0.1, verify-bundle on the produced bundle.
2929

docs/benchmarks/throughput_comparison.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ When the main metric of interest is **throughput** (number of specimen releases
77
1. **Run the benchmark** with the scripted baseline (default for throughput_sla in the baseline registry):
88

99
```bash
10-
labtrust run-benchmark --task throughput_sla --num-episodes 10 --out ./out/throughput_sla.json
10+
labtrust run-benchmark --task throughput_sla --episodes 10 --out ./out/throughput_sla.json
1111
```
1212

1313
The baseline registry maps `throughput_sla` to `scripted_ops_v1` (scripted agents that perform accept, process, and release). No coordination method is used; the task uses a fixed set of scripted agents and an initial state with specimens already in `accepted` status.

docs/contracts/cli_contract.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ This document defines the contract for all LabTrust-Gym CLI commands: exit codes
2626
| validate-fhir | `--bundle <path> --terminology <path>` [--strict] | 0 or 1 | (none; violations on stderr; exit 1 with --strict if any code outside value set) | Optional; not part of minimal benchmark. See fhir_export.md. |
2727
| verify-bundle | `--bundle <EvidenceBundle.v0.1 dir>` or `--strict-fingerprints` | 0 | (none; PASS on stderr) | frozen_contracts.md, trust_verification.md |
2828
| verify-release | `--release-dir <dir>` optional `--strict-fingerprints` | 0 | (none; summary on stderr; validates EvidenceBundles, risk register, RELEASE_MANIFEST hashes) | frozen_contracts.md, trust_verification.md |
29-
| build-release-manifest | `--release-dir <dir> --out <path>` | 0 | `<path>/RELEASE_MANIFEST.v0.1.json` (or into release-dir) | trust_verification.md |
29+
| build-release-manifest | `--release-dir <dir>` | 0 | `<release-dir>/RELEASE_MANIFEST.v0.1.json` | trust_verification.md |
3030
| run-security-suite | `--out <dir> --smoke` | 0 | `<dir>/SECURITY/attack_results.json` | security_attack_suite.md |
3131
| safety-case | `--out <dir>` | 0 | `<dir>/SAFETY_CASE/safety_case.json`, `safety_case.md` | risk_register.md, trust_verification.md |
3232
| run-official-pack | `--out <dir> --smoke` | 0 | `<dir>/pack_manifest.json`, `baselines/`, `baselines/results/`, `SECURITY/`, `SAFETY_CASE/` | official_benchmark_pack.md |

0 commit comments

Comments
 (0)