Skip to content

Benchmark MEL Native + WASM Replay Over an Extremely Large ETH Block#4671

Open
ganeshvanahalli wants to merge 3 commits intomel-validator-createvalidationentryfrom
mel-validation-benchmark
Open

Benchmark MEL Native + WASM Replay Over an Extremely Large ETH Block#4671
ganeshvanahalli wants to merge 3 commits intomel-validator-createvalidationentryfrom
mel-validation-benchmark

Conversation

@ganeshvanahalli
Copy link
Copy Markdown
Contributor

@ganeshvanahalli ganeshvanahalli commented Apr 24, 2026

MEL Validation Input Load Testing

Context

MEL validation records every parent-chain receipt and transaction trie node as a Keccak256 preimage, bundles them into an InputJSON, and sends that JSON to the arbitrator replay binary for WASM validation.

On a busy L1 with heavy logs, this preimage set can grow very large. This PR adds stress tests to answer whether the arbitrator can load a worst-case preimage set without OOMs, JSON parse errors, or preimage-table failures.

No L2 message extraction is performed here. These tests exclusively exercise the preimage encode → decode → load path.

What’s tested

Two complementary tests were added in system_tests/mel_validation_input_stress_test.go, gated by the mel_validation_input_stress build tag:

Test Scenario Stresses
TestMELValidationInputStress 75 medium txs per block × 8 blocks, each tx emits 30 LOG4 events of 512 bytes Trie-node-rich path: many receipts, deep receipt trie, many distinct preimages
TestMELValidationInputStressMaxReceipt 1 max-size tx per block × 17 blocks, each tx emits one LOG4 of ~1.35 MB Giant-leaf path: largest possible individual preimages

Each test:

  1. Deploys a pure-EVM log-emitting contract on L1 (no Solidity — raw bytecode to avoid deps)
  2. Spams L1 with calls to fill blocks
  3. Records receipt + transaction trie preimages using the same melrecording package the real MEL validator uses (arbnode/mel/recording/)
  4. Builds an InputJSON and writes it to ~/.arbitrum/validation-inputs///block_inputs_1.json
  5. Reports sizes for every stage (preimage count/bytes, JSON size, marshal time, overhead ratio)
    The JSON is consumable by crates/bench/src/bin.rs (benchbin) which spins up the WAVM machine with the preimages loaded and runs timed steps.

Results

Medium-receipt run (TestMELValidationInputStress)

  • 8 L1 blocks, 4 of them at ~99% gas limit, 401 receipts, 12,003 logs
  • 175 unique preimages, 2.96 MB raw / 3.96 MB JSON
  • benchbin loaded in ~30 ms, ran 800+ iterations × step sizes up to 1M cleanly

Max-receipt run (TestMELValidationInputStressMaxReceipt)

  • 17 L1 blocks, each with 1 tx at ~14.6M gas (~93% of block limit)
  • 16 receipts, each ~1.30 MB (the theoretical single-tx ceiling on a 15M-gas L1 block)
  • 49 total preimages, 39.03 MB raw / 52.04 MB JSON
  • benchbin loaded the 52 MB JSON in ~400 ms, ran the same step-size sweep with no memory or parse issues
    Both runs terminate with Machine too far in benchbin — expected, because we feed no real L2 inbox messages or valid start state, so the replay binary wanders into unreachable territory. What matters for this test is that the preimage resolver is built successfully and serves lookups through the initial replay steps.

Key Findings / Notes

  1. InputJSON null-field gotcha. Go's json.Marshal emits null for nil slices/maps. The Rust ValidationRequest deserializer (crates/validation/src/lib.rs) expects Vec and HashMap<..> and rejects null. The test helper writeValidationInputJSON explicitly initializes BatchInfo: [] and UserWasms: {}. Without this fix benchbin fails at JSON parse with invalid type: null, expected a sequence.
  2. benchbin needed the native feature on prover. The workspace Cargo.toml declares prover = { default-features = false, ... }, which hides the Machine::step_n method. Added features = ["native"] to crates/bench/Cargo.toml.
  3. Geth RPC fee cap. A 14.9M-gas tx at 100 GWei costs 1.49 ETH, exceeding geth's 1 ETH default rpc.txfeecap. Max-receipt test lowers l1Info.GasPrice to 60 GWei (fee = 0.89 ETH) just for its own txs and restores it on exit.
  4. Receipt deduplication effect. In the medium-receipt run only 175 preimages came from 401 receipts — receipts at the same tx index across blocks have byte-identical RLP when txs are identical. The max-receipt test avoids this by setting topic1 = BLOCK.NUMBER so every receipt differs.
  5. Theoretical ceilings observed.
    • Single tx max log data: ~1.35 MB (bound by 8·N + N²/524288 + 25K ≤ 15M). Observed 1.36 MB at 14.6M gas, under the cap.
    • Per-block log data (many txs): ~1.15 MB observed. Quadratic memory-expansion cost limits total bytes below the naive 15M/8 = 1.87 MB.

How to Run

# Build the Rust bench tool (one-time)
cargo build --release -p bench

# Medium-receipt stress
go test -tags mel_validation_input_stress \
  -run TestMELValidationInputStress \
  -v -timeout 30m ./system_tests/

# Max-receipt stress
go test -tags mel_validation_input_stress \
  -run TestMELValidationInputStressMaxReceipt \
  -v -timeout 10m ./system_tests/

# Feed the generated JSON to benchbin
./target/release/benchbin \
  --json-inputs ~/.arbitrum/validation-inputs/mel-stress-test-max/<timestamp>/block_inputs_1.json \
  --binary target/machines/latest/machine.v2.wavm.br

Resolves NIT-4719

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.22%. Comparing base (5fadcb4) to head (23f0736).

Additional details and impacted files
@@                           Coverage Diff                            @@
##           mel-validator-createvalidationentry    #4671       +/-   ##
========================================================================
- Coverage                                49.12%   34.22%   -14.91%     
========================================================================
  Files                                      513      513               
  Lines                                    61424    61424               
========================================================================
- Hits                                     30177    21023     -9154     
- Misses                                   26290    36744    +10454     
+ Partials                                  4957     3657     -1300     

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

❌ 21 Tests Failed:

Tests completed Failed Passed Skipped
5047 21 5026 0
View the top 3 failed tests by shortest run time
TestValidationPostMELReorgHandleInArbitratorMode
Stack Traces | 0.000s run time
=== RUN   TestValidationPostMELReorgHandleInArbitratorMode
    message_extraction_layer_validation_test.go:83: InitTestLog called concurrently - this test must not run in parallel
--- FAIL: TestValidationPostMELReorgHandleInArbitratorMode (0.00s)
TestRedisSeqCoordinatorMessageSync
Stack Traces | 0.000s run time
=== RUN   TestRedisSeqCoordinatorMessageSync
    seq_coordinator_test.go:309: InitTestLog called concurrently - this test must not run in parallel
--- FAIL: TestRedisSeqCoordinatorMessageSync (0.00s)
TestRedisSeqCoordinatorWrongKeyMessageSync
Stack Traces | 0.000s run time
=== RUN   TestRedisSeqCoordinatorWrongKeyMessageSync
    seq_coordinator_test.go:309: InitTestLog called concurrently - this test must not run in parallel
--- FAIL: TestRedisSeqCoordinatorWrongKeyMessageSync (0.00s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant