git clone https://github.com/fraware/LabTrust-Gym.git
cd LabTrust-Gym
pip install -e ".[dev]"
labtrust --version # optional: check version + git SHAFor a full verification command sequence, see Evaluation checklist. To test and audit the repo (lint, format, typecheck, tests, benchmarks, quick-eval, coordination, reproduce, docs), run the steps there or use make verify.
Comments and docstrings should be clear and free of unexplained jargon. See Documentation standards for module/class/function docstrings, structure, and style. New or modified public modules, classes, or functions must have docstrings that meet those standards (module purpose, no unexplained jargon, and for functions: summary and Args/Returns where applicable). Existing code is being brought up to standard incrementally.
Before opening a PR:
ruff formatandruff check(lines must not exceed 120 characters; E501 is enforced). For naming exceptions (N802, N806), see Code style and lint.mypy src/(must pass; CI fails on type errors)pytest -q -m "not slow"(quick check; excludes slow tests). For full env-dependent tests (security, CLI smoke):pip install -e ".[dev,env]"then run the security and CLI smoke steps from CI.labtrust validate-policy
Policy files under policy/ must validate against the JSON schemas in policy/schemas/. validate-policy checks schema and structural validity only; it does not check logical correctness (e.g. zone connectivity, invariant feasibility, or that controls match risks). After validate-policy, review policy for logical consistency and appropriateness of controls; validation is necessary but not sufficient. New or modified policy files must pass validation. Legacy and design-only YAML (e.g. override matrix, compiler contracts) live under docs/architecture/design/ and are not loaded by the runtime.
- Frozen contracts: Do not weaken runner output, queue contract, coordination interface, or risk register schema without a version bump and doc update. See Frozen contracts for the canonical list. For coordination: when N <= N_max only propose_actions is used; combine_submissions is never called (backward-compat guarantee).
- Implementation audit: What is tested vs manual checklists: see Evaluation checklist and CI.
- Troubleshooting: Common failures (verify-bundle, policy validation, pack gate, E2E chain): Troubleshooting.
Keep the repo root minimal: do not commit CLI or build artifacts (e.g. results.json, out.json, bench_smoke_*.json, quick_eval_*/, site/). Use labtrust_runs/ or --out <path> for benchmark and study outputs. See Repository structure.
The golden scenarios in policy/golden/golden_scenarios.v0.1.yaml define correctness. Do not weaken expectations. When adding engine behaviour, extend the suite only with new scenarios or new assertions; do not relax existing ones.
- New or modified policy files validated
- New emit types added to
policy/emits/emits_vocab.v0.1.yaml(or none) - Golden suite impact explained
- Tests added or updated
- New or modified public functions/methods have docstrings in Google style (summary + Args/Returns/Raises where applicable)
- New LLM coordination method or backend: Confirm the LLM excellence checklist (schema-valid decisions, hard-fail to NOOP, metadata, integration) and add the entry to the table in that section.
Preferred PR size: under 400 lines where practical.
Before tagging a release, run the full E2E artifacts chain and ensure it passes (package-release → export-risk-register into release dir → build-release-manifest → verify-release --strict-fingerprints → schema/crosswalk). Run python scripts/validate_security_safety_refs.py to ensure risk_registry, security_attack_suite, and safety case claims stay aligned. See CI and Trust verification.
- Quick-eval — Run 1 episode each of throughput_sla, adversarial_disruption, multi_site_stat:
labtrust quick-eval --seed 42(requires.[env,plots]). CI runs this on every push/PR. - LABTRUST_BENCH_SMOKE=1 — Run benchmark smoke (1 episode per task):
labtrust bench-smoke --seed 42(requires.[env]). - LABTRUST_REPRO_SMOKE=1 — Run reproduce smoke:
labtrust reproduce --profile minimal --out runs/repro_smoke(requires.[env,plots]). - Coordination tests — Run all coordination-related tests:
pytest -q tests/ -k coordination(requires.[env]). CI coordination-smoke job (whenLABTRUST_COORDINATION_SMOKE=1) runs validate-policy, these tests, and one-episode coord_scale + coord_risk. See Coordination methods (Coordination done checklist section). - LABTRUST_PAPER_SMOKE=1 — Run package-release paper profile smoke (1 episode baselines, 2 episodes insider_key_misuse study):
labtrust package-release --profile paper_v0.1 --seed-base 100 --out /tmp/paper_smoke(requires.[env,plots]). Determinism:pytest tests/test_package_release.py -v(includes paper_v0.1 smoke and CLI test). - LABTRUST_MARL_SMOKE=1 — Run MARL smoke:
pytest tests/test_marl_smoke.py -v(requires.[marl]). - Package-release:
labtrust package-release --profile minimal --out /tmp/labtrust_release --seed-base 100(requires.[env,plots]). For paper-ready artifact:--profile paper_v0.1(see Paper provenance). Determinism:pytest tests/test_package_release.py -vwithLABTRUST_REPRO_SMOKE=1for minimal/full; paper_v0.1 tests useLABTRUST_PAPER_SMOKE=1.