Skip to content

feat(testnet): add yaci-store N2N consumer (#75)#76

Open
paolino wants to merge 5 commits intomainfrom
004-yaci-store
Open

feat(testnet): add yaci-store N2N consumer (#75)#76
paolino wants to merge 5 commits intomainfrom
004-yaci-store

Conversation

@paolino
Copy link
Copy Markdown
Collaborator

@paolino paolino commented Apr 27, 2026

Summary

Adds bloxbean/yaci-store as a long-running N2N consumer in testnets/cardano_node_master/docker-compose.yaml, alongside the existing oura service. Pure protocol coverage — yaci-store is a widely-used Spring-Boot indexer in the Cardano ecosystem; running it under fault injection exercises a real consumer profile.

Refs #75. Independent of #70 (different role — additive coverage, not the tx-generator read path).

⚠️ Merge-blocked: 1 new finding on the 1h Antithesis run

Constitution hard gate (findings_new ≤ baseline) is NOT met. Antithesis run 25039550313 on commit 35bec7f reports:

Findings Count
new 1
ongoing 0
resolved 0
rare 0

Triage report: link (auth-scoped).

Compared to the most recent green main 1h baseline (06c0a64f, yest 11:12) which had 0 new findings, this branch adds 1 new finding directly attributable to yaci-store.

The finding

Property: Correctness › No unexpected container exits › container: yaci-store, exit code: 1

  • failingCount: 18, passingCount: 0 (yaci-store dies in every timeline; restart: always causes a crash-loop)
  • 32 other properties pass.

Root cause (from log example index 0)

2026-04-28T07:33:24.665Z ERROR 1 --- [Yaci Store App] [main] o.s.boot.SpringApplication : Application run failed

java.lang.IllegalStateException: Genesis points not found. From point could not be decided.
    at com.bloxbean.cardano.yaci.store.core.service.StartService.start(StartService.java:103)

What happens:

  1. vtime 16.7s — yaci-store JVM starts, begins Spring init.
  2. vtime 17.8s — fault injector applies a network/clog (Jammed) on links between relay1.example and yaci-store.example, max_duration=2.75s.
  3. vtime 19.1s — fault injector restores network.
  4. vtime 37.7s — yaci-store's StartService.start throws because the chain-handshake with the relay (interrupted by the earlier clog) failed to populate the chain "from point". Container exits 1.
  5. Restart loop repeats; in faults_enabled timelines, the same race recurs.

This is a fault-injection-sensitive startup race in yaci-store, not a SIGTERM/teardown issue. It is also unrelated to the SIGTERM-143 finding from local T021 — that was about teardown. This is about cold-start with adversarial network conditions.

Why it didn't show up locally: my T012/T013 verification used just up (no fault injection). yaci-store reaches the relay tip in ~30s when the network is clean. With a 2.75s clog at vtime 17.8s, the handshake never completes and the JVM exits.

Possible fixes (need decision before merge)

  1. Set explicit start-point env vars so yaci-store does not depend on the live relay handshake to find genesis points. Likely candidates: STORE_CARDANO_SYNC_START_SLOT=0 plus STORE_CARDANO_SYNC_START_BLOCKHASH=<computed_byron_genesis_hash>. Lowest-risk, closest to "fix-in-this-PR".
  2. Drop restart: always so a single crash kills yaci-store cleanly; combined with Use random staring points for adversary #1 if the first attempt still races. Removes the crash-loop noise but leaves the underlying brittleness.
  3. Whitelist the property in the harness for yaci-store on startup. Wrong choice — masks a real signal that yaci-store will misbehave at any point a relay flickers, not just at boot.
  4. Revert the compose change and file a follow-up issue with these findings; land later when Use random staring points for adversary #1 is verified.

Design docs

Full speckit feature folder: specs/004-yaci-store/.

What changed

Single source code touch: testnets/cardano_node_master/docker-compose.yaml gains one service stanza:

  • Image: docker.io/bloxbean/yaci-store@sha256:aa67ee3c222eb2e6371b34dcda198ae9f2c32c90fcaf5ebdd1d8353e3205da87 (upstream 2.0.0, latest stable).
  • N2N target: relay1.example:3001, magic 42.
  • Genesis files: read-only mount of the existing p1-configs named volume.
  • Storage: SPRING_DATASOURCE_URL=jdbc:h2:mem:mydb.
  • Healthcheck: wget on /actuator/health.
  • depends_on: configurator (must complete) and relay1 (must be started).

No new component under components/. No publish-images change.

Local verification (T010..T015, T021)

Check Spec ref Result
Compose validates T010
yaci-store healthy within 60s of cold up SC-003, T012 ✅ 25s
yaci-store cursor matches relay tip within 5 min SC-001, T013 ✅ ~30s, slot=28 hash match
Survives docker compose restart relay1 US1 #2, T014 ✅ no container restart, clean reconnect
Clean teardown SC-004, T015
just smoke-test end-to-end T041
SIGTERM exit code FR-009, T021 ⚠️ 143 (JVM signal-exit, separate issue from the Antithesis finding)
Cold docker pull of digest SC-005, T030
publish-images PR check green T031 ✅ all 8 checks
1h Antithesis findings ≤ baseline constitution hard gate +1 finding (genesis-points race), see above

Findings (cumulative)

  1. Antithesis [merge-blocking]No unexpected container exits fails 18×: yaci-store's startup chain-handshake is not tolerant of brief network faults on the relay link. Triage details above.
  2. Local [non-blocking] — SIGTERM exit code is 143, not 0. Standard JVM signal-exit pattern. Documented in research.md R-006. Same family as #49.

Constitution compliance

  • I. Composer-first workload: documented deviation (passive consumer, same precedent as oura).
  • VII. Image tag hygiene: upstream image, digest-pinned, exempt.
  • Hard gate (findings_new ≤ baseline): ❌ NOT met (1 > 0). Merge-blocked until resolved.

paolino added 5 commits April 27, 2026 19:03
Adds bloxbean/yaci-store as a long-running N2N consumer alongside oura,
pinned by content digest. Reuses p1-configs for genesis files; H2
in-memory backing store; 30s healthcheck warm-up.

Verified locally: yaci-store reaches relay tip in ~30s (slot=28,
block=2 hash match), survives a relay1 restart with a clean reconnect
(RestartCount=0), exits 143 on docker stop (JVM signal-exit, see
R-006).

Updates research.md R-006 and spec.md Assumptions to record the
SIGTERM-143 behaviour (not exit 0 as initially extrapolated).
@paolino paolino added the enhancement New feature or request label Apr 27, 2026
@paolino paolino self-assigned this Apr 27, 2026
@paolino paolino added the enhancement New feature or request label Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant