Skip to content

feat: add relay-only Amaru bootstrap testnet#95

Open
paolino wants to merge 24 commits intomainfrom
feat/amaru-bootstrap-producer-cluster
Open

feat: add relay-only Amaru bootstrap testnet#95
paolino wants to merge 24 commits intomainfrom
feat/amaru-bootstrap-producer-cluster

Conversation

@paolino
Copy link
Copy Markdown
Collaborator

@paolino paolino commented Apr 30, 2026

Refs #72

Summary

Adds testnets/cardano_amaru, an Antithesis-style cardano-node 10.7.1 cluster that runs the published amaru-bootstrap-producer image inside the cluster and starts two relay-only Amaru nodes from the produced bootstrap bundle.

Amaru is relay-only here. The only stake-bearing block producers are p1, p2, and p3. amaru-relay-1 and amaru-relay-2 receive no KES key, VRF key, cold key, operational certificate, or stake-pool genesis assignment.

Runtime Shape

  • pins every cardano_amaru cardano-node producer and relay to upstream ghcr.io/intersectmbo/cardano-node:10.7.1-amd64 by digest: ghcr.io/intersectmbo/cardano-node@sha256:3275d357053d21f3220f74b0854fd584e1fe322dfa1bbb78effd760c3191d14c
  • uses digest-only spelling because Antithesis image parameters reject Docker's combined repo:tag@sha256:digest form
  • runs ghcr.io/lambdasistemi/amaru-bootstrap-producer:d81dd7d31e1c23b3223d3c4155294b82dc56ea0e
  • runs ghcr.io/cardano-foundation/cardano-node-antithesis/tx-generator:6808a14, rebuilt after merging the N2C reconnect supervisor in lambdasistemi/cardano-node-clients@898a2c470ced6a82fa5a32b18cbaf195e1cce927
  • treats transient tx-generator control-socket gaps as reachable telemetry instead of failing SDK assertions
  • makes bootstrap-producer own the p1 ChainDB snapshot-refresh loop; retryable readiness/copy failures refresh the snapshot inside the same container and keep Antithesis stdout quiet
  • writes the final bundle to amaru-bundle, then each Amaru relay copies it into a private state volume before amaru run
  • enables Mermaid rendering in MkDocs and documents the current architecture, image pins, compatibility limits, and local usage

Dependency PR merged before repinning:

lambdasistemi/cardano-node-clients#105

Docs preview:

https://cf-cna-pr-95.surge.sh

Latest Fix

The first post-merge one-hour run on da5022c63fb0a972620a9ab8766c9f1db676c6c1 failed because bootstrap-producer exited with code 2 (chain-not-era-ready). The root cause was that compose copied a single p1 ChainDB snapshot and then waited against that static copy. Under fault scheduling, that copy can be too early forever.

This branch now refreshes the snapshot inside bootstrap-producer and retries only the bootstrap producer's retryable readiness/copy/extract classes: exit 1, 2, 5, and 7. Non-retryable config/conversion/nonce/import/output errors still exit non-zero. Per-attempt producer logs are retained in /srv/amaru/.logs; container stdout prints only the final commit line or a bounded non-retryable failure tail.

The docs graph also now renders as SVG in the built MkDocs site via docs/js/mermaid-init.js.

Evidence On Current Head

Current branch head: 723a72a3c02ac932cd21eab99bd870e370ccd638

Local checks before push:

git diff --check
bash -n scripts/smoke-test.sh
INTERNAL_NETWORK=false docker compose -f testnets/cardano_amaru/docker-compose.yaml config --quiet
INTERNAL_NETWORK=false docker compose -f testnets/cardano_amaru/docker-compose.yaml config --images \
  | awk '/@sha256/ && /:[^/@]+@sha256:/ { bad=1; print } END { exit bad }'
nix develop github:paolino/dev-assets?dir=mkdocs -c mkdocs build --strict --site-dir /tmp/cardano-node-antithesis-pr95-site
AMARU_BOOTSTRAP_SMOKE_TIMEOUT=1800 ./scripts/smoke-test.sh cardano_amaru 600

Local cardano_amaru smoke result on 723a72a:

  • p1, p2, and p3 answered cardano-cli ping
  • sidecar converged on attempt 1
  • tx-generator reported ready
  • refill UTxO was observed with p50_lovelace=5000000000
  • 5/5 transacts landed
  • final populationSize=16
  • bootstrap-producer completed with exit code 0
  • both Amaru relays copied the bundle into private state and stayed running after amaru run
  • script exited PASS: all 3 nodes responding

Browser docs check on the built site:

  • docs/testnets/cardano-amaru.md rendered with one .mermaid svg
  • browser console had no warnings/errors after reload

CI on 723a72a:

Build docs:

https://github.com/cardano-foundation/cardano-node-antithesis/actions/runs/25203038173

PR preview:

https://github.com/cardano-foundation/cardano-node-antithesis/actions/runs/25203038201

Tracer-sidecar CI:

https://github.com/cardano-foundation/cardano-node-antithesis/actions/runs/25203038183

Build/publish plus compose smoke for cardano_node_master and cardano_amaru:

https://github.com/cardano-foundation/cardano-node-antithesis/actions/runs/25203038207

One-hour faulted cardano_amaru Antithesis on 723a72a: passed.

GitHub wrapper:

https://github.com/cardano-foundation/cardano-node-antithesis/actions/runs/25203289286

Moog test-run id:

99b9f291c26d557d4e2f72dc49cbe82f761e28d8b5313203acaf5092be4f2462

Report:

https://cardano.antithesis.com/report/93eR3S0vr1bSdXeK78ucW8U2/GB_HZ8s_Cql7B4MMJouI_80CJObD-OntDdTGjzVN9oM.html?auth=v2.public.eyJzY29wZSI6eyJSZXBvcnRTY29wZVYxIjp7ImFzc2V0IjoiR0JfSFo4c19DcWw3QjRNTUpvdUlfODBDSk9iRC1PbnREZFRHanpWTjlvTS5odG1sIiwicmVwb3J0X2lkIjoiOTNlUjNTMHZyMWJTZFhlSzc4dWNXOFUyIn19LCJuYmYiOiIyMDI2LTA1LTAxVDA1OjMxOjA4LjA1NDM1NjkyN1oifXD9y1P9y81PG0ce2xa9vxzcDxO65khF_CET7animLurC2tiyP2Ge-PFzQGNZu9OK62sghZmS0XctMQTWOZ3GA8

@paolino paolino self-assigned this Apr 30, 2026
@paolino paolino added enhancement New feature or request documentation Improvements or additions to documentation labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

🚀 Documentation preview

Preview: https://cf-cna-pr-95.surge.sh

Built from 22562cb.
The preview updates on every push to this branch.

@paolino
Copy link
Copy Markdown
Collaborator Author

paolino commented Apr 30, 2026

Updated this PR to consume the merged/published amaru-bootstrap-producer image from lambdasistemi/amaru-bootstrap PR #30.

Pinned image:
ghcr.io/lambdasistemi/amaru-bootstrap-producer:d81dd7d31e1c23b3223d3c4155294b82dc56ea0e

Upstream evidence:
lambdasistemi/amaru-bootstrap#30
https://github.com/lambdasistemi/amaru-bootstrap/actions/runs/25172449567
https://github.com/lambdasistemi/amaru-bootstrap/actions/runs/25172636074

Local evidence before pushing commit d839481:

  • docker pull confirmed the published tag, digest sha256:213c1f454ba7db4a3927f7f8cea94778d84d1d698197b74cb121899c5bb40069.
  • INTERNAL_NETWORK=false docker compose -f testnets/cardano_amaru/docker-compose.yaml config passed.
  • AMARU_BOOTSTRAP_SMOKE_TIMEOUT=1800 ./scripts/smoke-test.sh cardano_amaru 600 passed.
  • In that smoke, bootstrap-producer satisfied readiness at target slot 250, emitted ledger states at 10/130/250, imported ledger state/headers/nonces, and wrote /srv/amaru/testnet_42.
  • amaru-relay-1 and amaru-relay-2 both consumed the bundle and stayed running.
  • nix develop --quiet --command mkdocs build --strict passed.
  • nix develop --quiet --command shellcheck scripts/smoke-test.sh passed.
  • git diff --check passed.

@paolino
Copy link
Copy Markdown
Collaborator Author

paolino commented Apr 30, 2026

Added the Antithesis log-budget fix in commit 4889d89.

What changed:

  • Amaru relay services now set AMARU_LOG=warn, AMARU_TRACE=warn, and AMARU_COLOR=never.
  • The relay shell wrapper no longer emits bundle-wait heartbeat logs.
  • scripts/smoke-test.sh cardano_amaru now fails unless both Amaru relay containers have the warning/error log environment before it accepts the bootstrap load proof.
  • The testnet docs and handoff now document the log budget.

Source check: upstream Amaru uses AMARU_LOG for its tracing EnvFilter; the default is more verbose, so this must be explicit in compose.

Local evidence after the change:

  • INTERNAL_NETWORK=false docker compose -f testnets/cardano_amaru/docker-compose.yaml config passed and expanded the quiet env onto both Amaru relay services.
  • nix develop --quiet --command shellcheck scripts/smoke-test.sh passed.
  • nix develop --quiet --command mkdocs build --strict passed.
  • AMARU_BOOTSTRAP_SMOKE_TIMEOUT=1800 ./scripts/smoke-test.sh cardano_amaru 600 passed with the new env assertions.
  • During the smoke wait, docker compose logs --tail=50 amaru-relay-1 amaru-relay-2 produced no relay polling output, and docker inspect showed AMARU_LOG=warn, AMARU_TRACE=warn, and AMARU_COLOR=never on both relays.

@paolino
Copy link
Copy Markdown
Collaborator Author

paolino commented Apr 30, 2026

Launched a one-hour Antithesis run for the Amaru testnet from this PR branch.

Workflow run:
https://github.com/cardano-foundation/cardano-node-antithesis/actions/runs/25175759447

Launch parameters:

  • ref: feat/amaru-bootstrap-producer-cluster
  • commit: de6b2e07ce2970059490ac13f59650f5aafbff8d
  • test: cardano_amaru
  • duration: 1 hour
  • no-faults: false

The workflow bug where the test input was ignored is fixed in commit de6b2e0 before this launch. The GitHub job has completed Submit test successfully and is now in Wait for results.

@paolino paolino marked this pull request as ready for review May 1, 2026 06:36
@paolino paolino force-pushed the feat/amaru-bootstrap-producer-cluster branch from 2c113e3 to 3b7fb02 Compare May 1, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant