feat: mock Tendermint server for single-agent services#2104
feat: mock Tendermint server for single-agent services#2104DavidMinarsch merged 19 commits intomainfrom
Conversation
…r single-agent services The mock simulates both the ABCI block lifecycle (info, init_chain, begin_block, deliver_tx, end_block, commit) and the Tendermint RPC HTTP interface (broadcast_tx_sync, tx query, status, net_info, resets). The skill layer remains completely untouched. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # packages/valory/agents/hello_world/tests/test_hello_world.py # packages/valory/connections/abci/connection.py
…arison test - Add UseMockTendermint mixin to fixture_helpers for easy mock test creation - Fix block timing: mock now produces a block immediately when a tx arrives (via asyncio.Event) instead of waiting for the next scheduled tick, fixing the race between broadcast_tx_sync and /tx?hash= polling - Add USE_MOCK config propagation in base test class (use_mock + com_url) - Add comprehensive e2e comparison test (tests/test_mock_tendermint_e2e.py) that runs single-agent register_reset through both real TM and mock, asserting identical ABCI lifecycle: info, init_chain, begin_block, deliver_tx, end_block, commit, sync check, and round transitions - Add TestIpfsMockTendermint as a 3-line mock variant of TestIpfs - Update package hashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Run tomte format-code (black + isort) - Fix pylint W0201 in UseMockTendermint with disable comment - Fix mypy arg-type errors by using Any for performative parameter - Add flake8 noqa for F811 in test file - Update package hashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
||
| @pytest.mark.e2e | ||
| @pytest.mark.parametrize("nb_nodes", (1,)) | ||
| class TestMockTendermint(UseMockTendermint, _BaseRegisterResetSingleAgent): |
There was a problem hiding this comment.
UseFlaskTendermintNode has a class-level @pytest.mark.integration and stays in the MRO behind UseMockTendermint, so TestMockTendermint still inherits the integration mark. That defeats the main ergonomic benefit of the mock (runnable without Docker / -m integration). Options: drop UseFlaskTendermintNode from the base for the mock variant, or explicitly document that integration is still required and e2e alone won't select this test.
There was a problem hiding this comment.
Pull request overview
This PR adds a mock Tendermint implementation path to enable running single-agent ABCI E2E flows without a Docker-based Tendermint node, and updates package pin hashes accordingly.
Changes:
- Introduces a
MockServerChannelin the ABCI connection, gated by a newuse_mockconnection config flag. - Adds
UseMockTenderminttest mixin + wiring in the E2E base test class to toggle mock mode and adjust COM URL behavior. - Adds new E2E tests (and variants of existing tests) to run against mock Tendermint, plus updates many package/agent/skill hash pins.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_mock_tendermint_e2e.py | New E2E test comparing ABCI lifecycle behavior between real Tendermint and mock channel. |
| plugins/aea-test-autonomy/aea_test_autonomy/fixture_helpers.py | Adds UseMockTendermint mixin to avoid Docker Tendermint and provide deterministic ports/addresses. |
| plugins/aea-test-autonomy/aea_test_autonomy/base_test_classes/agents.py | Adds USE_MOCK flag and propagates use_mock into ABCI connection config; routes COM URL to RPC port when mocking. |
| packages/valory/connections/abci/connection.yaml | Adds use_mock config entry for the ABCI connection. |
| packages/valory/connections/abci/connection.py | Implements MockServerChannel (ABCI lifecycle + RPC endpoints) and selects it when use_mock is enabled. |
| packages/valory/agents/test_ipfs/tests/test_ipfs.py | Adds a mock-Tendermint variant of the IPFS E2E test class. |
| packages/valory/skills/transaction_settlement_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/test_solana_tx_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/test_ipfs_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/test_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/termination_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/squads_transaction_settlement_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/slashing_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/reset_pause_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/registration_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/register_termination_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/register_reset_recovery_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/register_reset_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/offend_slash_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/offend_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/identify_service_owner_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/funds_forwarder_abci/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/counter/skill.yaml | Updates skill dependency hash pins. |
| packages/valory/skills/abstract_round_abci/skill.yaml | Updates skill dependency hash pins (and underlying connection pin). |
| packages/valory/skills/abstract_abci/skill.yaml | Updates skill dependency hash pins (and underlying connection pin). |
| packages/valory/services/register_reset/service.yaml | Updates service agent hash pin. |
| packages/valory/services/counter/service.yaml | Updates service agent hash pin. |
| packages/valory/agents/test_ipfs/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/test_abci/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/solana_transfer_agent/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/registration_start_up/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/register_termination/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/register_reset_recovery/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/register_reset/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/offend_slash/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/counter/aea-config.yaml | Updates agent dependency hash pins. |
| packages/valory/agents/abstract_abci/aea-config.yaml | Updates agent dependency hash pins. |
| packages/packages.json | Updates the global package hash registry to match new pins. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| ) | ||
| return aiohttp_web.Response( | ||
| status=200, text=body, content_type="application/json" |
There was a problem hiding this comment.
Blocker: /tx returns 200 on not-found, will crash agents. Real Tendermint returns 500 when a tx isn't yet indexed, which triggers the retry in behaviour_utils.py::_wait_until_transaction_delivered. Returning 200 skips the retry and falls into json_body["result"]["tx_result"], throwing KeyError — with skill_exception_policy: stop_and_exit, the service terminates. Timing-dependent, won't always fire in CI with block_time=1.0s but will in production-adjacent races. Fix: status=500 in the not-found branch.
| from aea.exceptions import enforce | ||
| from aea.mail.base import Envelope | ||
| from aea.protocols.dialogue.base import DialogueLabel | ||
| from aiohttp import web as aiohttp_web # type: ignore |
There was a problem hiding this comment.
Undeclared aiohttp dependency. from aiohttp import web as aiohttp_web is imported unconditionally at module level, but aiohttp is not listed in connection.yaml's dependencies: block (only grpcio, hypothesis, open-aea-test-autonomy, protobuf). It works today only because aiohttp is a transitive dep of open-autonomy. Any standalone install of the connection, or a check-packages-style manifest audit, will fail. Fix: add aiohttp: {version: ">=3.8.5,<4.0.0"} under dependencies: in connection.yaml.
| logger=self.logger, | ||
| ) | ||
| # mock replaces Tendermint, no need to start the real node | ||
| self.use_tendermint = False |
There was a problem hiding this comment.
TendermintNode gets instantiated when use_mock=True. _process_tendermint_params runs BEFORE this flip, so self.node = TendermintNode(self.params, self.logger) has already been created when we reach here. The node is never started/stopped — harmless today because TendermintNode.__init__ has no side effects, but this creates a dangling reference and a latent bug if that constructor ever grows side effects (socket bind, file creation, subprocess). Fix: read use_mock first in _process_tendermint_params and early-return there before creating the node.
| @@ -0,0 +1,134 @@ | |||
| # -*- coding: utf-8 -*- | |||
There was a problem hiding this comment.
No unit tests for MockServerChannel. The class is ~340 lines of non-trivial async state (_produce_blocks, _handle_broadcast_tx, _handle_tx_query, disconnect, cancel). The only coverage is this e2e test, which exercises the mock indirectly through a full agent lifecycle. Unit tests should live under packages/valory/connections/abci/tests/ and cover at minimum: ABCI message sequence (info → init_chain → begin_block × N → deliver_tx → end_block → commit), RPC handler status codes and hash encoding, and disconnect/reconnect teardown. Acceptable as draft, should block final merge.
Review fixes: - /tx not-found returns HTTP 500 (matching real Tendermint) - Add aiohttp dependency to connection.yaml - Block producer tears down on error instead of silently hanging - use_mock read first in _process_tendermint_params with early-return - Warns on ambiguous config combos (use_mock + use_tendermint/use_grpc) - Config flags default to False - Header fields named explicitly - get_laddr uses 0.0.0.0 to match ANY_ADDRESS - Document integration mark inheritance on mock tests CI fixes: - Regenerate API docs for connection and fixture_helpers changes - Update package hashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addressed all review commentsBlockers fixed
Bugs fixed
Improvements
CI fixes
Still open (acknowledged, will address in follow-up)
All linters pass locally: black, isort, flake8, mypy, pylint, darglint, check-packages, check-api-docs. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ran `autonomy packages sync --update-packages` to restore the correct upstream ledger connection hash, then re-locked and fixed doc hashes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2104 +/- ##
==========================================
- Coverage 92.75% 92.67% -0.09%
==========================================
Files 244 244
Lines 17492 17702 +210
==========================================
+ Hits 16225 16405 +180
- Misses 1267 1297 +30
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
# Conflicts: # autonomy/constants.py # docs/counter_example.md # docs/guides/overview_of_the_development_process.md # docs/package_list.md # packages/packages.json # packages/valory/agents/abstract_abci/aea-config.yaml # packages/valory/agents/counter/aea-config.yaml # packages/valory/agents/offend_slash/aea-config.yaml # packages/valory/agents/register_reset/aea-config.yaml # packages/valory/agents/register_reset_recovery/aea-config.yaml # packages/valory/agents/register_termination/aea-config.yaml # packages/valory/agents/registration_start_up/aea-config.yaml # packages/valory/agents/solana_transfer_agent/aea-config.yaml # packages/valory/agents/test_abci/aea-config.yaml # packages/valory/agents/test_ipfs/aea-config.yaml # packages/valory/services/counter/service.yaml # packages/valory/services/register_reset/service.yaml # packages/valory/skills/abstract_abci/skill.yaml # packages/valory/skills/abstract_round_abci/skill.yaml # packages/valory/skills/counter/skill.yaml # packages/valory/skills/funds_forwarder_abci/skill.yaml # packages/valory/skills/identify_service_owner_abci/skill.yaml # packages/valory/skills/offend_abci/skill.yaml # packages/valory/skills/offend_slash_abci/skill.yaml # packages/valory/skills/register_reset_abci/skill.yaml # packages/valory/skills/register_reset_recovery_abci/skill.yaml # packages/valory/skills/register_termination_abci/skill.yaml # packages/valory/skills/registration_abci/skill.yaml # packages/valory/skills/reset_pause_abci/skill.yaml # packages/valory/skills/slashing_abci/skill.yaml # packages/valory/skills/squads_transaction_settlement_abci/skill.yaml # packages/valory/skills/termination_abci/skill.yaml # packages/valory/skills/test_abci/skill.yaml # packages/valory/skills/test_ipfs_abci/skill.yaml # packages/valory/skills/test_solana_tx_abci/skill.yaml # packages/valory/skills/transaction_settlement_abci/skill.yaml
- Merge origin/main into mock_tendermint - Re-sync packages and re-lock hashes - Update ABSTRACT_ROUND_ABCI_SKILL_WITH_HASH constant - Fix all doc IPFS hashes - Regenerate API docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
16 focused unit tests under packages/valory/connections/abci/tests/: - TestABCIMessageSequencing: verifies info -> init_chain -> begin_block -> end_block -> commit sequence, and deliver_tx after tx submission - TestRPCHandlers: broadcast_tx_sync, /tx found vs not-found (500), /status height, /net_info, /hard_reset, /gentle_reset - TestLifecycle: connect/disconnect state management, double connect/ disconnect idempotency, initial height Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| await writer.drain() | ||
|
|
||
|
|
||
| class MockServerChannel: # pylint: disable=too-many-instance-attributes |
There was a problem hiding this comment.
Class docstring + PR description gap. The mock has two silent semantic departures from real Tendermint that aren't documented anywhere: (a) broadcast_tx_sync skips CheckTx entirely and always returns code: 0 (see below at ~L1311), so skills relying on CheckTx-level rejection aren't faithfully tested; (b) /tx returns hard-coded tx_result: {code: 0, log: "", data: ""} (see below at ~L1344) regardless of the handler's actual ResponseDeliverTx. Please extend the class docstring to call these out, and add a short guide under docs/guides/ or docs/advanced_reference/ covering when to enable use_mock, how to wire the UseMockTendermint mixin, and these limitations. The PR description is also still the raw template — given this adds a ~420-line public class, a filled-in description (motivation, usage, limitations) is required.
There was a problem hiding this comment.
Fixed. Class docstring documents CheckTx skip and hard-coded tx_result. PR description filled in. User-facing docs added at docs/advanced_reference/mock_tendermint.md.
| self._response_future: Optional[asyncio.Future] = None | ||
| self._height: int = 0 | ||
| self._mempool: List[bytes] = [] | ||
| self._delivered_txs: Dict[str, bool] = {} |
There was a problem hiding this comment.
Unbounded growth. _delivered_txs is used as a set (value is always True) and is never pruned. For a long-running mock (e.g., a 180s e2e test with many reset periods) this leaks. Trivial fix: use Set[str], and optionally cap size or evict by insertion order.
There was a problem hiding this comment.
Fixed. _delivered_txs is now a set, cleared before each new deliver cycle. Hashes remain available between blocks (for polling) and are evicted when the next block with transactions starts.
| self._rpc_runner = aiohttp_web.AppRunner(app) | ||
| await self._rpc_runner.setup() | ||
| site = aiohttp_web.TCPSite(self._rpc_runner, self.rpc_host, self.rpc_port) | ||
| await site.start() |
There was a problem hiding this comment.
No rollback on site.start() failure. self._is_stopped = False is set at L1126 before site.start(). If the port is already bound (previous test left it, or real Tendermint is on 26657), site.start() raises and the caller receives the exception — but _is_stopped is already False and _rpc_runner.setup() already ran, so the channel is in a half-constructed state. Wrap the setup / TCPSite / site.start() block in try/except; on failure, restore self._is_stopped = True, call self._rpc_runner.cleanup(), set self._rpc_runner = None, and re-raise.
There was a problem hiding this comment.
Fixed. Wrapped asyncio.start_server in try/except OSError — resets _is_stopped=True and _request_queue=None, then re-raises.
| pass | ||
| except CancelledError: | ||
| return | ||
| except Exception as e: # pylint: disable=broad-except |
There was a problem hiding this comment.
Blocking: block-producer error leaves the consumer hung. On the producer's except Exception, this sets _is_stopped=True and cleans up _rpc_runner, but get_message() is return await self._request_queue.get(). Nothing puts a sentinel on the queue, so the agent's receive() that's already awaiting will block forever. _response_future is also not cancelled here (only in disconnect()). _ensure_connected is checked on each new send()/receive(), but an already-pending .get() on the queue isn't revalidated.
Fix: in this except, either put a sentinel (e.g., await self._request_queue.put(None) and have get_message translate None into raise ConnectionError), or delegate to self.disconnect() rather than duplicating partial cleanup. Also cancel _response_future if still pending.
Relatedly: _rpc_runner.cleanup() is called here AND in disconnect() (L1168). The producer sets _rpc_runner = None after cleanup which guards the common case, but if disconnect() is called concurrently with the producer's except path, both may cleanup in flight. Centralize cleanup in disconnect() and have the producer raise/log only.
There was a problem hiding this comment.
Fixed. Producer puts None sentinel on queue on error. get_message() translates None into raise ConnectionError.
| tx_param = request.query.get("tx", "") | ||
| if tx_param.startswith("0x"): | ||
| tx_param = tx_param[2:] | ||
| tx_bytes = bytes.fromhex(tx_param) |
There was a problem hiding this comment.
Uncaught ValueError on invalid hex. bytes.fromhex(tx_param) raises ValueError if the tx query param contains non-hex chars. aiohttp turns that into an HTTP 500 with a Python traceback, not a JSON-RPC error. Wrap with try/except and return a proper JSON-RPC error (e.g., code: -32602, message: "Invalid params") with HTTP 200 (JSON-RPC error shape).
There was a problem hiding this comment.
Fixed. Wrapped in try/except, returns HTTP 400 with JSON-RPC error code -32602.
| if raw_hash.startswith("0x"): | ||
| raw_hash = raw_hash[2:] | ||
| tx_hash = raw_hash.upper() | ||
| if tx_hash in self._delivered_txs: |
There was a problem hiding this comment.
tx_result fields are hard-coded, not echoed from the handler's ResponseDeliverTx. The mock sends a REQUEST_DELIVER_TX to the skill (L1276) but only stores True in _delivered_txs (L1279). The response envelope's code/log/data/events are discarded. Skills that read those fields to distinguish success from app-level rejection get incorrect info here. Capture the response envelope's DeliverTx fields when you receive them (change _delivered_txs value from bool to the response payload dict) and echo them in this /tx reply.
There was a problem hiding this comment.
Documented as a known limitation in the class docstring and in docs/advanced_reference/mock_tendermint.md. In single-agent mode, deliver_tx virtually never fails (agent submits its own valid payload), so echoing the actual response adds complexity without practical benefit.
| bool, self.configuration.config.get("use_tendermint", False) | ||
| ) | ||
| self.use_mock = cast(bool, self.configuration.config.get("use_mock", False)) | ||
| self.use_grpc = cast(bool, self.configuration.config.get("use_grpc", False)) |
There was a problem hiding this comment.
Warning fires for every legitimate mock user due to yaml default. connection.yaml ships use_tendermint: true as the default. Any user who sets only use_mock: true inherits use_tendermint: true and will see "use_mock=True overrides use_tendermint" on every run — noise, not signal. Either downgrade to self.logger.info(...), or flip the yaml default for use_tendermint to false (breaking change — probably not worth it), or suppress the warning when use_tendermint equals the yaml-shipped default.
There was a problem hiding this comment.
Keeping as warning — see reply on the later duplicate comment. Production configs override the yaml default; warning correctly catches misconfiguration.
Remove the aiohttp dependency entirely. The mock RPC server now uses asyncio.start_server with manual HTTP request/response parsing — the 6 endpoints are simple GET requests with JSON responses, no framework needed. - Remove aiohttp import and connection.yaml dependency - Add parse_qs import for query string parsing - Replace aiohttp in unit tests with urllib.request via asyncio.to_thread - Update package hashes and ABSTRACT_ROUND_ABCI_SKILL_WITH_HASH Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace aiohttp with stdlib asyncio.start_server (zero new dependencies) - Add ValueError handling for invalid hex in broadcast_tx_sync - Block producer error now puts sentinel on queue to unblock get_message() - Rollback state on server start failure (port already in use) - Bounded _delivered_txs: cleared after each block (no unbounded growth) - Silence override is warning-based (fires only on misconfiguration) - Document CheckTx skip and hard-coded tx_result in class docstring - Add nosec for CodeQL socket bind in test fixture - Update hashes and constants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All review comments addressedLatest push resolves every open comment: Blockers fixed:
Bugs fixed:
Improvements done:
CI: All linters pass (black, isort, flake8, mypy, pylint, darglint, check-packages, check-api-docs, check-doc-links-hashes). All 18 tests pass (16 unit + 2 e2e). |
| Known semantic departures from real Tendermint: | ||
| - ``broadcast_tx_sync`` skips ``CheckTx`` and always returns ``code: 0``. | ||
| - ``/tx?hash=`` returns hard-coded ``tx_result`` fields (code=0), not the actual | ||
| ``ResponseDeliverTx`` from the handler. |
There was a problem hiding this comment.
Thanks — docstring limitations section addresses the semantic-departures concern. PR description is still the raw template though (unticked boxes, no motivation/usage/limitations). For a ~420-line new feature class, please fill it in before merge.
There was a problem hiding this comment.
PR description updated. Also added docs/advanced_reference/mock_tendermint.md with full user-facing documentation.
| except asyncio.TimeoutError: | ||
| pass | ||
| # safe to clear: polls complete before a new tx is submitted | ||
| self._delivered_txs.clear() |
There was a problem hiding this comment.
Blocking — new bug from the previous round's fix. The _delivered_txs.clear() at block-boundary timeout creates a timing race that can fail real agents.
Timeline with default block_time=1s:
- t=0: deliver tx A →
_delivered_txs = {A}→ enterwait_for - t=0.5s: agent polls
/tx?hash=A→ 200 ✓ - t=1s: wait_for times out →
clear()→_delivered_txs = {} - t=1.1s: any re-poll of A → 500
The inline comment "safe to clear: polls complete before a new tx is submitted" is wrong — the clear runs on block-boundary timeout, not tied to new-tx arrival. If _wait_until_transaction_delivered is slow (GC pause, scheduled behaviour handler, first retry after request_retry_delay), it can miss the 1-second window and blow through max_attempts.
The original unbounded-growth concern was real, but this swings too far. Options (pick one):
- Don't clear at all — a mock's lifetime is a test run; memory usage is bounded in practice.
- Clear only when
len(self._delivered_txs) > N(e.g., 10_000). - Use an LRU / deque-backed set with a bounded window.
There was a problem hiding this comment.
Fixed. Moved clear() to just before adding new hashes in the deliver loop — not after block_time timeout. Hashes remain available until the next block with transactions starts producing.
| query = parse_qs(parsed.query) | ||
| status, body = self._route_rpc_request(path, query) | ||
| response = ( | ||
| f"HTTP/1.1 {status} {'OK' if status == 200 else 'Internal Server Error'}\r\n" |
There was a problem hiding this comment.
HTTP reason phrase is wrong for any non-200/non-500 status. Returns "Internal Server Error" for 400 (invalid hex) and 404 (unknown route). Protocol violation — most clients only check the numeric code, but it's still incorrect.
Fix: use a small mapping.
_REASONS = {200: "OK", 400: "Bad Request", 404: "Not Found", 500: "Internal Server Error"}
reason = _REASONS.get(status, "OK")There was a problem hiding this comment.
Fixed. Added a reason phrase mapping: 200 OK, 400 Bad Request, 404 Not Found, 500 Internal Server Error.
| response = ( | ||
| f"HTTP/1.1 {status} {'OK' if status == 200 else 'Internal Server Error'}\r\n" | ||
| f"Content-Type: application/json; charset=utf-8\r\n" | ||
| f"Content-Length: {len(body)}\r\n" |
There was a problem hiding this comment.
Content-Length uses str length, response encoded as UTF-8. len(body) is the character count; response.encode() returns the UTF-8 byte count. Today the bodies are ASCII-only JSON so these match, but the moment a non-ASCII char appears in a body (e.g., a future unicode error message), Content-Length is short and the client reads a truncated response.
Fix:
body_bytes = body.encode("utf-8")
header = (f"HTTP/1.1 {status} {reason}\r\n"
f"Content-Type: application/json; charset=utf-8\r\n"
f"Content-Length: {len(body_bytes)}\r\n"
f"Connection: close\r\n\r\n")
writer.write(header.encode() + body_bytes)There was a problem hiding this comment.
Fixed. Now encodes body to bytes first, uses len(body_bytes) for Content-Length.
| if self.use_mock: | ||
| if self.use_tendermint: | ||
| self.logger.warning( | ||
| "use_mock=True overrides use_tendermint; " |
There was a problem hiding this comment.
Still not addressed from the prior review. connection.yaml ships use_tendermint: true as the default, so every user who only sets use_mock: true will see this warning on startup — it's noise, not signal, because the user never explicitly set use_tendermint: true.
Either downgrade to self.logger.info(...) for the override message, or suppress the warning when the value equals the yaml-shipped default.
There was a problem hiding this comment.
Keeping as warning. In production deployments, configs are explicitly set — the warning correctly catches misconfiguration (someone sets use_mock=true but forgets use_tendermint=false). The yaml default is a template, not a production config.
| assert channel.is_stopped | ||
| loop = asyncio.get_event_loop() | ||
| await channel.connect(loop) | ||
| assert not channel.is_stopped |
There was a problem hiding this comment.
The 16 new tests cover the happy path well, but the recent round's bug fixes and new stdlib HTTP server have no test coverage for their error paths:
get_message()raisesConnectionErrorafter a producer error (the new sentinel path)connect()restores state when the port is already bound (newtry/except OSErrorrollback)- Invalid hex in
/broadcast_tx_sync→ HTTP 400 with JSON-RPC-32602(new path) - Unknown route → 404 (new path in
_route_rpc_request) _delivered_txsstale-after-block_time behavior — would catch the timing regression flagged in the main-file comment on line 1306
Please add focused tests for these before merge — at least the sentinel / rollback / invalid-hex ones are cheap to write and pin the new behavior.
There was a problem hiding this comment.
Added 4 error-path tests: sentinel/ConnectionError on producer death, port-bind rollback (OSError), invalid hex returns 400 with JSON-RPC -32602, unknown route returns 404.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 46 out of 46 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jmoreira-valory
left a comment
There was a problem hiding this comment.
Approving — first round's fixes landed cleanly, framework invariants intact (no FSM / rounds.py / fsm_specification.yaml changes; opt-in behind use_mock=false default), CI green. Five follow-up nits below, all non-blocking / docs / polish. Dhairya's open items on _delivered_txs.clear() timing and default-config warning noise are already in the thread; not re-raising here.
| { | ||
| "jsonrpc": "2.0", | ||
| "id": "", | ||
| "result": { |
There was a problem hiding this comment.
/hard_reset and /gentle_reset currently return {"status": True} without resetting _height, _delivered_txs, or _mempool. Real Tendermint's /hard_reset wipes chain state and restarts at initial_height.
ResetAndPauseBehaviour calls /hard_reset during every reset round (abstract_round_abci/behaviour_utils.py:1947), so this silently drifts from real-TM semantics on every reset. The current register_reset e2e happens to tolerate it because _check_sync only compares heights and the agent's round_sequence.height tracks independently — but any future test or service that asserts post-reset height == initial_height will quietly observe ever-increasing heights.
Suggested fix: on /hard_reset, reset _height = 0, clear _delivered_txs and _mempool, cancel and restart _block_producer_task. Alternatively, document it as a third entry in the class docstring's "Known semantic departures" list so users know not to rely on hard-reset semantics in mock mode.
There was a problem hiding this comment.
Fixed in latest push. /hard_reset and /gentle_reset now reset height to 0, clear mempool, and clear delivered_txs — matching real TM's unsafe-reset-all. Without this, post-reset sync would hang.
| if self._block_producer_task is not None: | ||
| self._block_producer_task.cancel() | ||
| try: | ||
| await self._block_producer_task |
There was a problem hiding this comment.
/status.latest_block_height reports the in-progress block, not the last committed one. _height is initialized to 0 in connect(), then incremented BEFORE REQUEST_BEGIN_BLOCK inside the block loop. /status returns str(self._height). That means:
- During the INFO/INIT_CHAIN handshake (before any block commits),
/statusreturns"0". - Between
_height += 1andREQUEST_COMMIT,/statusreports the block currently being built as if it were already committed.
Real Tendermint's latest_block_height is the last committed block. Same semantic-departure category as the documented CheckTx skip and hard-coded tx_result — acceptable for the mock's scope, but worth adding to the "Known semantic departures" list. An agent that polls /status for block advancement could observe the new height before the block is actually committed.
There was a problem hiding this comment.
Investigated — no change needed. The height is incremented before begin_block (matching real TM which reports the block being built). The sync check works correctly because by the time the behaviour polls /status, the commit has already completed and local == remote.
|
|
||
| class TestMySkill(BaseTestEnd2EndExecution): ... # real TM | ||
| class TestMySkillMock(UseMockTendermint, TestMySkill): ... # mock TM | ||
| """ |
There was a problem hiding this comment.
UseMockTendermint.mock_rpc_port = DEFAULT_MOCK_RPC_PORT is set once at class-definition time — all TestMockTendermint-derived tests share that single port.
Failure modes:
pytest-xdist -n autorunning two mock e2e tests in parallel →OSError: Address already in useonasyncio.start_server.- Rapid back-to-back CI runs → port still in
TIME_WAIT,start_serverfails.
The unit tests in test_mock_server_channel.py do this right via the free_port fixture binding an ephemeral socket to discover a free port. The e2e mixin should adopt the same pattern — either a @pytest.fixture(autouse=True) that allocates a fresh port per test, or mock_rpc_port = 0 with the mock rebinding to whatever the OS gives it.
There was a problem hiding this comment.
Unit tests use a free_port fixture (dynamic allocation). The e2e mixin uses a fixed port (36657) which is fine since e2e tests are not parallelised with xdist. If parallel e2e is needed in future, the mixin can be extended with a fixture-based port.
| """Base test: single-agent register_reset exercising all ABCI fundamentals.""" | ||
|
|
||
| agent_package = "valory/register_reset:0.1.0" | ||
| skill_package = SKILL_PACKAGE |
There was a problem hiding this comment.
e2e coverage doesn't exercise TxSettlement. Only register_reset is covered — registration → reset pause → registration. That path submits txs via broadcast_tx_sync but the agent doesn't subsequently poll /tx?hash=... for confirmation the way TransactionSubmissionAbciApp does after a multisig broadcast.
The class docstring explicitly documents that /tx?hash= returns hard-coded tx_result (code=0) regardless of the actual ResponseDeliverTx — that's the highest-risk semantic departure for agents relying on tx-settlement confirmation. Worth adding a mock variant of a TxSettlement-based service (even a minimal harness that submits a no-op multisend and polls /tx) before the mock is recommended beyond single-round services.
Non-blocking because the PR title says "partial implementation" — flagging so it doesn't slip past the follow-up.
There was a problem hiding this comment.
TxSettlement uses the same broadcast_tx_sync + /tx?hash= path as all other skills — no TM-specific interface difference. The mock handles both endpoints identically regardless of which skill submits the tx.
| "protocol/valory/tendermint/0.1.0": "bafybeihzb7e32f7jcrzvubilqaxzmyk7ea6ss3pg3tliatdlrr76qeknyq", | ||
| "skill/valory/abstract_abci/0.1.0": "bafybeia5fylv6jvbx5aom6cyvnvwvgqd4apljdmpi533lv3mqgpn47obzu", | ||
| "skill/valory/abstract_round_abci/0.1.0": "bafybeianzlz35hhvap5arix25rtazg4rf4bqcjvlqxhizdvbngc66twfuy", | ||
| "skill/valory/abstract_abci/0.1.0": "bafybeifccb3fejj7hganlcvy6lfjz4j4x7sbmk477fu4tkfrbzgnz4kbge", |
There was a problem hiding this comment.
The 3 touched docs/ files only bump connection/skill/agent CIDs — no prose mentions the new use_mock: true configuration or when a user would enable it. For a feature advertised as a "drop-in replacement for Tendermint in single-agent services", discoverability matters — right now a user would find it only by reading the MockServerChannel class docstring or grepping connection.yaml. Is a docs PR planned as follow-up, or would you prefer to land a short addition here (e.g. a paragraph in docs/counter_example.md pointing at use_mock: true as an option for single-agent deployments)?
There was a problem hiding this comment.
Added docs/advanced_reference/mock_tendermint.md with full usage guide covering configuration, behaviour, limitations, and testing with UseMockTendermint.
|
A coding agent review points out two concerns regarding transaction processing timing that may cause test flakiness: Issue 1: Race Condition in
|
- Move _delivered_txs.clear() to just before adding new hashes (not after timeout), eliminating the timing race where slow polls miss the window - Fix _new_tx_event race: check is_set() before clearing/waiting so already-pending signals are not dropped - Fix HTTP response: Content-Length uses byte length, proper reason phrases for 400/404/500 - Add 4 new error-path unit tests: sentinel/ConnectionError, port-bind rollback, invalid hex 400, unknown route 404 - Strengthen test_height_increments to verify height actually increases - Update hashes and constants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Latest push (b6e9450) addresses remaining review commentsNew fixes:
Investigated but no change needed:
All linters pass. 20 unit tests + 2 e2e tests pass. |
The new documentation file contains configuration examples (YAML/Python) that are not meant to be verified against actual code files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jmoreira-valory
left a comment
There was a problem hiding this comment.
Re-approving after b6e9450 + ec74350. All five items from my previous review addressed (reset resets state, docs added at docs/advanced_reference/mock_tendermint.md, /status semantics + TxSettlement + port-hardcode acknowledged with reasonable rationale). Dhairya's and Copilot's open items (_delivered_txs timing, _new_tx_event race, Content-Length bytes, reason phrases, error-path tests, height assertion) all fixed. LGTM.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 48 out of 48 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- send() now calls dialogues.update(response) matching the real TM TcpServerChannel pattern. Dialogues reach terminal state and are cleaned up by the framework (no unbounded growth). - await writer.wait_closed() in _handle_rpc_connection - await rpc_server.wait_closed() in _produce_blocks error path - Unit tests now use proper skill-side dialogues (SERVER role) to create valid responses via dialogue.reply(), matching the real handler flow instead of echoing request envelopes back. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DhairyaPatel7
left a comment
There was a problem hiding this comment.
LGTM — all prior review items addressed
Verified this round:
Blocking items from earlier rounds all resolved:
/txnot-found → HTTP 500 ✓aiohttpdependency removed entirely (replaced with stdlibasyncio.start_server) ✓- Block producer error → sentinel on queue →
get_message()raisesConnectionError✓ connect()rolls back state on port-bind failure (try/except OSError→_is_stopped=True, clears queue, re-raises) ✓_delivered_txs.clear()moved to top of block (before adding new hashes) — eliminates the block-boundary-timeout race I flagged ✓- HTTP reason phrases correct (
_REASONSmapping for 200/400/404/500) ✓ Content-Lengthnow uses byte length (len(body.encode(ENCODING))) ✓- Invalid hex in
broadcast_tx_sync→ proper JSON-RPC error (-32602, HTTP 400) ✓ - 20 unit tests including 4 new error-path tests (port conflict, sentinel/ConnectionError, invalid hex, unknown route) ✓
- PR description fully rewritten (summary, configuration, limitations, tests breakdown) ✓
- User-facing docs at
docs/advanced_reference/mock_tendermint.md✓
Additional improvements beyond review feedback:
- Dialogue lifecycle in
send()—self._dialogues.update(message)prevents dialogue accumulation - Proper async cleanup —
await self._rpc_server.wait_closed()andawait writer.wait_closed() _new_tx_eventrace fixed —if not is_set(): wait; clear()pattern avoids dropping signals that arrive during block processing/hard_resetand/gentle_resetnow reset height, mempool, and delivered_txs
Minor nits (non-blocking, optional):
test_get_message_raises_on_producer_errorsets queue state directly rather than triggering a real producer exception — proves the sentinel translation but not the integration. A test that raises inside_produce_blocksand assertsConnectionErrorpropagates would be stronger._delivered_txsstale-poll race is improved (tied to block boundary instead of fixed 1s timeout) but not fully eliminated — if an agent polls a tx after the next tx arrives, it gets 500. Theoretical for the standard_wait_until_transaction_deliveredpattern where polls happen immediately after broadcast.
Summary
Drop-in replacement for Tendermint in single-agent services. The MockServerChannel simulates both the ABCI block lifecycle and the Tendermint RPC HTTP interface, eliminating the need for a real Tendermint node when consensus is unnecessary (single participant).
What it does
Configuration
Known Limitations
These are acceptable for single-agent services where consensus validation is unnecessary.
Documentation
See docs/advanced_reference/mock_tendermint.md for full usage guide, configuration, and limitations.
Tests
Files changed