Skip to content

Rosetta: fix some uncaught async exceptions causing 502s#18720

Open
glyh wants to merge 2 commits intocompatiblefrom
lyh/tighten-rosetta-error-handlings
Open

Rosetta: fix some uncaught async exceptions causing 502s#18720
glyh wants to merge 2 commits intocompatiblefrom
lyh/tighten-rosetta-error-handlings

Conversation

@glyh
Copy link
Copy Markdown
Member

@glyh glyh commented Apr 3, 2026

Problem

The Rosetta /network/status endpoint (and all other routes) was producing 502 errors under daemon stress. Two bugs combined to cause this:

Bug 1 — Yojson.Basic.from_string could throw an uncaught exception (graphql.ml)

When the Mina daemon returns a HTTP 200 with a non-JSON body (e.g. during restart or under load), Yojson.Basic.from_string throws Yojson.Json_error. This call lived inside an Async deferred callback, so it was never caught by the synchronous try...with in rosetta.ml. The exception escaped to on_handler_error, leaving the HTTP connection in a broken state (or killing the process if MINA_ROSETTA_TERMINATE_ON_SERVER_ERROR was set) — both visible as 502s from any reverse proxy.

Bug 2 — try...with in router does not cover async exceptions (rosetta.ml)

The top-level exception handler in router was a plain OCaml try...with. In Async, this only catches exceptions raised synchronously during deferred construction. Any exception raised inside a let%bind/let%map callback fires later in the event loop, outside the try...with scope, and propagates unhandled to on_handler_error.

glyh and others added 2 commits April 3, 2026 13:51
…uter

The previous try...with only caught synchronous exceptions during deferred
construction, missing exceptions raised inside async callbacks (e.g.
Yojson.Basic.from_string in graphql.ml when the daemon returns a 200 with
non-JSON body). Those escaped to on_handler_error and could crash the process.

Monitor.try_with catches exceptions across the full async deferred chain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dkijania
Copy link
Copy Markdown
Member

dkijania commented Apr 3, 2026

It would be nice to have some tests if current behavior is not covered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: To triage

Development

Successfully merging this pull request may close these issues.

2 participants