You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**A multi-agent environment (PettingZoo/Gym) for hospital lab automation, with a reference trust skeleton.** The first instance models a pathology lab—specifically a blood sciences lane ([Glossary](docs/reference/glossary.md#lab-terminology-hospital-lab-pathology-lab-blood-sciences-lab)).
11
-
12
-
**What it provides:** RBAC, signed actions, append-only audit log, invariants, and anomaly throttles—all driven by versioned policy and golden scenarios.
13
-
14
-
**Trust skeleton (at a glance)**
15
-
16
-
```mermaid
17
-
flowchart LR
18
-
Policy["policy/ (YAML)"]
19
-
Policy --> RBAC["RBAC"]
20
-
Policy --> Sig["Signed\nactions"]
21
-
Policy --> Audit["Audit log\n(hash-chained)"]
22
-
Policy --> Inv["Invariants"]
23
-
Policy --> Codes["Reason\ncodes"]
24
-
```
10
+
**A multi-agent environment (PettingZoo/Gym) for hospital lab automation, with a reference trust skeleton.**
25
11
26
12
---
27
13
28
14
## Contents
29
15
30
-
-[North star](#north-star)
31
-
-[Who is this for?](#who-is-this-for--i-want-to)
32
-
-[Installation](#installation-pip)
33
-
-[Pipelines](#pipelines)
34
-
-[Quick eval](#quick-eval)
35
-
-[CLI](#cli)
36
-
-[Repository structure](#repository-structure)
37
-
-[Golden runner](#golden-runner)
38
-
-[Reproducibility and citation](#reproducibility-and-citation)
39
-
-[Release and contract freeze](#release-and-contract-freeze)
40
-
-[Architecture diagrams](docs/architecture/diagrams.md) (full pipeline and lab topology)
16
+
-[LabTrust-Gym](#labtrust-gym)
17
+
-[Contents](#contents)
18
+
-[North star](#north-star)
19
+
-[Who is this for? / I want to...](#who-is-this-for--i-want-to)
20
+
-[Installation (pip)](#installation-pip)
21
+
-[Pipelines](#pipelines)
22
+
-[Quick eval](#quick-eval)
23
+
-[CLI](#cli)
24
+
-[Policy and validation](#policy-and-validation)
25
+
-[Benchmarking and evaluation](#benchmarking-and-evaluation)
26
+
-[Export and verification](#export-and-verification)
27
+
-[Security and safety](#security-and-safety)
28
+
-[Risk register](#risk-register)
29
+
-[Coordination and studies](#coordination-and-studies)
30
+
-[Release and reproducibility](#release-and-reproducibility)
31
+
-[Repository structure](#repository-structure)
32
+
-[Reproducibility and citation](#reproducibility-and-citation)
33
+
-[License](#license)
41
34
42
35
---
43
36
@@ -67,7 +60,7 @@ System and threat model: [Systems and threat model](docs/architecture/systems_an
67
60
| I want to... | First step |
68
61
|--------------|------------|
69
62
| Run benchmarks only |`pip install labtrust-gym[env,plots]` then `labtrust quick-eval`|
70
-
| Add my coordination method (or task) |[Extension development](docs/agents/extension_development.md) + entry_points; see [examples/extension_example](examples/extension_example/)|
63
+
| Add my coordination method (or task) |[Extension development](docs/agents/extension_development.md) + entry_points; see [examples/extension_example](https://github.com/fraware/LabTrust-Gym/tree/main/examples/extension_example)|
71
64
| Fork and customize policy |[Forker guide](docs/getting-started/forkers.md) and `labtrust forker-quickstart`|
72
65
| Use as a library without forking |[Extension development](docs/agents/extension_development.md) + `--profile` + `extension_packages` in a lab profile |
73
66
| Run the full security suite |`labtrust run-security-suite`; needs `.[env]`; use `--skip-system-level` when env is not installed |
@@ -98,9 +91,6 @@ labtrust validate-policy
98
91
pytest -q
99
92
```
100
93
101
-
-**Live tests:** Run when `OPENAI_API_KEY` and `LABTRUST_RUN_LLM_LIVE=1` or `LABTRUST_RUN_LLM_ATTACKER=1` are set. Use **`pytest -m 'not slow'`**; avoid `-m 'not slow and not live'` if you want live tests to run.
102
-
-**Policy path:** Run from repo root so `policy/` is found; otherwise **PolicyPathError**. Override with **LABTRUST_POLICY_DIR**. See [Installation](docs/getting-started/installation.md) and [Troubleshooting](docs/getting-started/troubleshooting.md#policy-directory-not-found-policypatherror).
|`[env]`| PettingZoo/Gymnasium (benchmarks and full security suite including coord_pack_ref) |
124
-
|`[plots]`| Matplotlib |
114
+
|`[plots]`| Matplotlib and Pillow (study figures, data tables) |
125
115
|`[llm_openai]`| OpenAI live backend (openai_live) |
126
116
|`[llm_anthropic]`| Anthropic live backend (anthropic_live) |
127
117
|`[marl]`| Stable-Baselines3 (PPO train/eval) |
@@ -139,9 +129,9 @@ Benchmarks run in one of three modes: **deterministic** | **llm_offline** | **ll
139
129
```mermaid
140
130
flowchart LR
141
131
Run["Run benchmark"]
142
-
Run --> D["deterministic\n(default)"]
132
+
Run --> D["deterministic(default)"]
143
133
Run --> O["llm_offline"]
144
-
Run --> L["llm_live\n+ --allow-network"]
134
+
Run --> L["llm_live+ --allow-network"]
145
135
D --> NoNet["No network"]
146
136
O --> NoNet
147
137
L --> Net["Network / API"]
@@ -155,11 +145,6 @@ flowchart LR
155
145
156
146
Set mode with `--pipeline-mode`; for live LLM add `--allow-network` or `LABTRUST_ALLOW_NETWORK=1`.
157
147
158
-
> **Why you saw no OpenAI calls**
159
-
> Runs are **offline by default**. `quick-eval`, `run-benchmark`, `reproduce`, and `package-release` use `pipeline_mode=deterministic` unless you pass `--pipeline-mode llm_live` and `--allow-network`. The CLI loads `.env` (or `LABTRUST_DOTENV_PATH`); keys there are used for live LLM.
160
-
> **Live LLM:**`--pipeline-mode llm_live --allow-network --llm-backend openai_live` (or `anthropic_live`, `ollama_live`). The CLI prints **WILL MAKE NETWORK CALLS / MAY INCUR COST**.
161
-
> Every run records `pipeline_mode`, `llm_backend_id`, `llm_model_id`, and `allow_network` in **results.json** and UI **index.json**; result files also record **non_deterministic** for audit.
162
-
163
148
---
164
149
165
150
## Quick eval
@@ -270,12 +255,6 @@ Put CLI outputs in `labtrust_runs/` or `--out`. Exit codes, minimal smoke args,
270
255
271
256
---
272
257
273
-
## Golden runner
274
-
275
-
The golden runner (`labtrust_gym.runner`) runs scenarios from `policy/golden/golden_scenarios.v0.1.yaml` against an environment adapter implementing `LabTrustEnvAdapter` (reset, step, query). Step results must conform to the runner output contract (status, emits, violations, hashchain, etc.); unknown emits fail the suite. Full suite: `LABTRUST_RUN_GOLDEN=1 pytest tests/test_golden_suite.py`.
276
-
277
-
---
278
-
279
258
## Reproducibility and citation
280
259
281
260
Cite using [CITATION.cff](CITATION.cff).
@@ -291,16 +270,6 @@ Cite using [CITATION.cff](CITATION.cff).
291
270
292
271
---
293
272
294
-
## Release and contract freeze
295
-
296
-
-**Release** — E2E artifacts chain before tagging. [Trust verification](docs/risk-and-security/trust_verification.md), [CONTRIBUTING](CONTRIBUTING.md). **`make verify`** (full battery); **`make paper OUT=<dir>`** (paper artifact); **`labtrust audit-selfcheck --out <dir>`** (Phase A + doctor checks). Paper claims regression: [PAPER_CLAIMS](docs/benchmarks/PAPER_CLAIMS.md).
297
-
-**Version** — `labtrust --version` (version + git SHA). Tag from clean main after checklist.
Copy file name to clipboardExpand all lines: docs/benchmarks/hospital_lab_full_pipeline_results_report.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Hospital lab full pipeline – results report
2
2
3
-
This document summarizes the results from full-pipeline runs for the pathology lab (blood sciences) design: what was run, what succeeded, and how to interpret the artifacts.
3
+
This document summarizes **example** results from full-pipeline runs for the pathology lab (blood sciences) design: what was run, what succeeded, and how to interpret the artifacts. The run directories cited (e.g. `runs/hospital_lab_full_pipeline_smoke`) are from representative runs; regenerate them with [Hospital lab full pipeline](hospital_lab_full_pipeline.md) if needed.
The baseline registry maps `throughput_sla` to `scripted_ops_v1` (scripted agents that perform accept, process, and release). No coordination method is used; the task uses a fixed set of scripted agents and an initial state with specimens already in `accepted` status.
Copy file name to clipboardExpand all lines: docs/contracts/cli_contract.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ This document defines the contract for all LabTrust-Gym CLI commands: exit codes
26
26
| validate-fhir |`--bundle <path> --terminology <path>`[--strict]| 0 or 1 | (none; violations on stderr; exit 1 with --strict if any code outside value set) | Optional; not part of minimal benchmark. See fhir_export.md. |
27
27
| verify-bundle |`--bundle <EvidenceBundle.v0.1 dir>` or `--strict-fingerprints`| 0 | (none; PASS on stderr) | frozen_contracts.md, trust_verification.md |
0 commit comments