Benchmark MEL Native + WASM Replay Over an Extremely Large ETH Block by ganeshvanahalli · Pull Request #4671 · OffchainLabs/nitro

ganeshvanahalli · 2026-04-24T15:44:26Z

MEL Validation Input Load Testing

Context

MEL validation records every parent-chain receipt and transaction trie node as a Keccak256 preimage, bundles them into an InputJSON, and sends that JSON to the arbitrator replay binary for WASM validation.

On a busy L1 with heavy logs, this preimage set can grow very large. This PR adds stress tests to answer whether the arbitrator can load a worst-case preimage set without OOMs, JSON parse errors, or preimage-table failures.

No L2 message extraction is performed here. These tests exclusively exercise the preimage encode → decode → load path.

What’s tested

Two complementary tests were added in system_tests/mel_validation_input_stress_test.go, gated by the mel_validation_input_stress build tag:

Test	Scenario	Stresses
`TestMELValidationInputStress`	75 medium txs per block × 8 blocks, each tx emits 30 `LOG4` events of 512 bytes	Trie-node-rich path: many receipts, deep receipt trie, many distinct preimages
`TestMELValidationInputStressMaxReceipt`	1 max-size tx per block × 17 blocks, each tx emits one `LOG4` of ~1.35 MB	Giant-leaf path: largest possible individual preimages

Each test:

Deploys a pure-EVM log-emitting contract on L1 (no Solidity — raw bytecode to avoid deps)
Spams L1 with calls to fill blocks
Records receipt + transaction trie preimages using the same melrecording package the real MEL validator uses (arbnode/mel/recording/)
Builds an InputJSON and writes it to ~/.arbitrum/validation-inputs///block_inputs_1.json
Reports sizes for every stage (preimage count/bytes, JSON size, marshal time, overhead ratio)
The JSON is consumable by crates/bench/src/bin.rs (benchbin) which spins up the WAVM machine with the preimages loaded and runs timed steps.

Results

Medium-receipt run (TestMELValidationInputStress)

8 L1 blocks, 4 of them at ~99% gas limit, 401 receipts, 12,003 logs
175 unique preimages, 2.96 MB raw / 3.96 MB JSON
benchbin loaded in ~30 ms, ran 800+ iterations × step sizes up to 1M cleanly

Max-receipt run (TestMELValidationInputStressMaxReceipt)

17 L1 blocks, each with 1 tx at ~14.6M gas (~93% of block limit)
16 receipts, each ~1.30 MB (the theoretical single-tx ceiling on a 15M-gas L1 block)
49 total preimages, 39.03 MB raw / 52.04 MB JSON
benchbin loaded the 52 MB JSON in ~400 ms, ran the same step-size sweep with no memory or parse issues
Both runs terminate with Machine too far in benchbin — expected, because we feed no real L2 inbox messages or valid start state, so the replay binary wanders into unreachable territory. What matters for this test is that the preimage resolver is built successfully and serves lookups through the initial replay steps.

Key Findings / Notes

InputJSON null-field gotcha. Go's json.Marshal emits null for nil slices/maps. The Rust ValidationRequest deserializer (crates/validation/src/lib.rs) expects Vec and HashMap<..> and rejects null. The test helper writeValidationInputJSON explicitly initializes BatchInfo: [] and UserWasms: {}. Without this fix benchbin fails at JSON parse with invalid type: null, expected a sequence.
benchbin needed the native feature on prover. The workspace Cargo.toml declares prover = { default-features = false, ... }, which hides the Machine::step_n method. Added features = ["native"] to crates/bench/Cargo.toml.
Geth RPC fee cap. A 14.9M-gas tx at 100 GWei costs 1.49 ETH, exceeding geth's 1 ETH default rpc.txfeecap. Max-receipt test lowers l1Info.GasPrice to 60 GWei (fee = 0.89 ETH) just for its own txs and restores it on exit.
Receipt deduplication effect. In the medium-receipt run only 175 preimages came from 401 receipts — receipts at the same tx index across blocks have byte-identical RLP when txs are identical. The max-receipt test avoids this by setting topic1 = BLOCK.NUMBER so every receipt differs.
Theoretical ceilings observed.
- Single tx max log data: ~1.35 MB (bound by 8·N + N²/524288 + 25K ≤ 15M). Observed 1.36 MB at 14.6M gas, under the cap.
- Per-block log data (many txs): ~1.15 MB observed. Quadratic memory-expansion cost limits total bytes below the naive 15M/8 = 1.87 MB.

How to Run

# Build the Rust bench tool (one-time)
cargo build --release -p bench

# Medium-receipt stress
go test -tags mel_validation_input_stress \
  -run TestMELValidationInputStress \
  -v -timeout 30m ./system_tests/

# Max-receipt stress
go test -tags mel_validation_input_stress \
  -run TestMELValidationInputStressMaxReceipt \
  -v -timeout 10m ./system_tests/

# Feed the generated JSON to benchbin
./target/release/benchbin \
  --json-inputs ~/.arbitrum/validation-inputs/mel-stress-test-max/<timestamp>/block_inputs_1.json \
  --binary target/machines/latest/machine.v2.wavm.br

Resolves NIT-4719

codecov · 2026-04-24T16:01:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.22%. Comparing base (5fadcb4) to head (23f0736).

Additional details and impacted files

@@                           Coverage Diff                            @@
##           mel-validator-createvalidationentry    #4671       +/-   ##
========================================================================
- Coverage                                49.12%   34.22%   -14.91%     
========================================================================
  Files                                      513      513               
  Lines                                    61424    61424               
========================================================================
- Hits                                     30177    21023     -9154     
- Misses                                   26290    36744    +10454     
+ Partials                                  4957     3657     -1300

github-actions · 2026-04-24T16:35:38Z

❌ 21 Tests Failed:

Tests completed	Failed	Passed	Skipped
5047	21	5026	0

View the top 3 failed tests by shortest run time

TestValidationPostMELReorgHandleInArbitratorMode

Stack Traces | 0.000s run time

=== RUN   TestValidationPostMELReorgHandleInArbitratorMode
    message_extraction_layer_validation_test.go:83: InitTestLog called concurrently - this test must not run in parallel
--- FAIL: TestValidationPostMELReorgHandleInArbitratorMode (0.00s)

TestRedisSeqCoordinatorMessageSync

Stack Traces | 0.000s run time

=== RUN   TestRedisSeqCoordinatorMessageSync
    seq_coordinator_test.go:309: InitTestLog called concurrently - this test must not run in parallel
--- FAIL: TestRedisSeqCoordinatorMessageSync (0.00s)

TestRedisSeqCoordinatorWrongKeyMessageSync

Stack Traces | 0.000s run time

=== RUN   TestRedisSeqCoordinatorWrongKeyMessageSync
    seq_coordinator_test.go:309: InitTestLog called concurrently - this test must not run in parallel
--- FAIL: TestRedisSeqCoordinatorWrongKeyMessageSync (0.00s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

…n-benchmark

Benchmark MEL Native + WASM Replay Over an Extremely Large ETH Block

2be0af3

ganeshvanahalli requested a review from rauljordan April 24, 2026 15:46

ganeshvanahalli and others added 2 commits April 28, 2026 23:31

Merge branch 'mel-validator-createvalidationentry' into mel-validatio…

f954e4a

…n-benchmark

add stress test calculating arbitrator steps taken for mel validation

23f0736

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark MEL Native + WASM Replay Over an Extremely Large ETH Block#4671

Benchmark MEL Native + WASM Replay Over an Extremely Large ETH Block#4671
ganeshvanahalli wants to merge 3 commits intomel-validator-createvalidationentryfrom
mel-validation-benchmark

ganeshvanahalli commented Apr 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ganeshvanahalli commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MEL Validation Input Load Testing

Context

What’s tested

Results

Medium-receipt run (TestMELValidationInputStress)

Max-receipt run (TestMELValidationInputStressMaxReceipt)

Key Findings / Notes

How to Run

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 21 Tests Failed:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ganeshvanahalli commented Apr 24, 2026 •

edited

Loading

codecov Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading