Contributing to LabTrust-Gym

Development setup

git clone https://github.com/fraware/LabTrust-Gym.git
cd LabTrust-Gym
pip install -e ".[dev]"
labtrust --version   # optional: check version + git SHA

For a full verification command sequence, see Evaluation checklist. To test and audit the repo (lint, format, typecheck, tests, benchmarks, quick-eval, coordination, reproduce, docs), run the steps there or use make verify.

Documentation

Comments and docstrings should be clear and free of unexplained jargon. See Documentation standards for module/class/function docstrings, structure, and style. New or modified public modules, classes, or functions must have docstrings that meet those standards (module purpose, no unexplained jargon, and for functions: summary and Args/Returns where applicable). Existing code is being brought up to standard incrementally.

Code quality

Before opening a PR:

ruff format and ruff check (lines must not exceed 120 characters; E501 is enforced). For naming exceptions (N802, N806), see Code style and lint.
mypy src/ (must pass; CI fails on type errors)
pytest -q -m "not slow" (quick check; excludes slow tests). For full env-dependent tests (security, CLI smoke): pip install -e ".[dev,env]" then run the security and CLI smoke steps from CI.
labtrust validate-policy

Policy files under policy/ must validate against the JSON schemas in policy/schemas/. validate-policy checks schema and structural validity only; it does not check logical correctness (e.g. zone connectivity, invariant feasibility, or that controls match risks). After validate-policy, review policy for logical consistency and appropriateness of controls; validation is necessary but not sufficient. New or modified policy files must pass validation. Legacy and design-only YAML (e.g. override matrix, compiler contracts) live under docs/architecture/design/ and are not loaded by the runtime.

Testing and contracts

Frozen contracts: Do not weaken runner output, queue contract, coordination interface, or risk register schema without a version bump and doc update. See Frozen contracts for the canonical list. For coordination: when N <= N_max only propose_actions is used; combine_submissions is never called (backward-compat guarantee).
Implementation audit: What is tested vs manual checklists: see Evaluation checklist and CI.
Troubleshooting: Common failures (verify-bundle, policy validation, pack gate, E2E chain): Troubleshooting.

Keep the repo root minimal: do not commit CLI or build artifacts (e.g. results.json, out.json, bench_smoke_*.json, quick_eval_*/, site/). Use labtrust_runs/ or --out <path> for benchmark and study outputs. See Repository structure.

Golden suite

The golden scenarios in policy/golden/golden_scenarios.v0.1.yaml define correctness. Do not weaken expectations. When adding engine behaviour, extend the suite only with new scenarios or new assertions; do not relax existing ones.

PR checklist

New or modified policy files validated
New emit types added to policy/emits/emits_vocab.v0.1.yaml (or none)
Golden suite impact explained
Tests added or updated
New or modified public functions/methods have docstrings in Google style (summary + Args/Returns/Raises where applicable)
New LLM coordination method or backend: Confirm the LLM excellence checklist (schema-valid decisions, hard-fail to NOOP, metadata, integration) and add the entry to the table in that section.

Preferred PR size: under 400 lines where practical.

Before tagging a release, run the full E2E artifacts chain and ensure it passes (package-release → export-risk-register into release dir → build-release-manifest → verify-release --strict-fingerprints → schema/crosswalk). Run python scripts/validate_security_safety_refs.py to ensure risk_registry, security_attack_suite, and safety case claims stay aligned. See CI and Trust verification.

Optional smoke tests (env vars)

Quick-eval — Run 1 episode each of throughput_sla, adversarial_disruption, multi_site_stat: labtrust quick-eval --seed 42 (requires .[env,plots]). CI runs this on every push/PR.
LABTRUST_BENCH_SMOKE=1 — Run benchmark smoke (1 episode per task): labtrust bench-smoke --seed 42 (requires .[env]).
LABTRUST_REPRO_SMOKE=1 — Run reproduce smoke: labtrust reproduce --profile minimal --out runs/repro_smoke (requires .[env,plots]).
Coordination tests — Run all coordination-related tests: pytest -q tests/ -k coordination (requires .[env]). CI coordination-smoke job (when LABTRUST_COORDINATION_SMOKE=1) runs validate-policy, these tests, and one-episode coord_scale + coord_risk. See Coordination methods (Coordination done checklist section).
LABTRUST_PAPER_SMOKE=1 — Run package-release paper profile smoke (1 episode baselines, 2 episodes insider_key_misuse study): labtrust package-release --profile paper_v0.1 --seed-base 100 --out /tmp/paper_smoke (requires .[env,plots]). Determinism: pytest tests/test_package_release.py -v (includes paper_v0.1 smoke and CLI test).
LABTRUST_MARL_SMOKE=1 — Run MARL smoke: pytest tests/test_marl_smoke.py -v (requires .[marl]).
Package-release: labtrust package-release --profile minimal --out /tmp/labtrust_release --seed-base 100 (requires .[env,plots]). For paper-ready artifact: --profile paper_v0.1 (see Paper provenance). Determinism: pytest tests/test_package_release.py -v with LABTRUST_REPRO_SMOKE=1 for minimal/full; paper_v0.1 tests use LABTRUST_PAPER_SMOKE=1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to LabTrust-Gym

Development setup

Documentation

Code quality

Testing and contracts

Golden suite

PR checklist

Optional smoke tests (env vars)

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to LabTrust-Gym

Development setup

Documentation

Code quality

Testing and contracts

Golden suite

PR checklist

Optional smoke tests (env vars)