go-ethereum: add freezer safety margin to prevent data loss after unclean shutdown#4506
Draft
joshuacolvin0 wants to merge 1 commit intomasterfrom
Draft
go-ethereum: add freezer safety margin to prevent data loss after unclean shutdown#4506joshuacolvin0 wants to merge 1 commit intomasterfrom
joshuacolvin0 wants to merge 1 commit intomasterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #4506 +/- ##
==========================================
- Coverage 32.66% 32.17% -0.50%
==========================================
Files 495 495
Lines 58724 58756 +32
==========================================
- Hits 19185 18903 -282
- Misses 36165 36471 +306
- Partials 3374 3382 +8 |
Contributor
❌ 13 Tests Failed:
View the top 3 failed tests by shortest run time
📣 Thoughts on this report? Let Codecov know! | Powered by Codecov |
01645ff to
91a9166
Compare
…lean shutdown After an unclean shutdown, repair() may truncate the freezer head to restore cross-table consistency. Previously, blocks were deleted from the key-value store immediately after freezing, so truncated blocks could end up missing from both stores — making the node unable to start (especially for L2 nodes that cannot re-sync pruned blocks from peers). Introduce a safety margin (freezerCleanupMargin = freezerBatchLimit) that retains the most recently frozen blocks in the key-value store. Since freezeRange reads via nofreezedb (which bypasses the ancient store), retained blocks can be re-frozen after repair() truncation. Key changes: - Add cleanupMargin field on chainFreezer with persisted cleanup tail (freezerCleanupTailKey) so progress resumes across restarts - Replace immediate post-freeze deletion with incremental cleanup over [cleanupStart, cleanupLimit) using Has()+Get() to distinguish missing keys from I/O errors, with backoff on failure - Add startup validation in Open(): detect unrecoverable data gaps where the freezer has been truncated below the cleanup tail - Handle upgrade path (skip-ahead when no tail but frozen > FullImmutabilityThreshold) and fresh installs (clean from block 1) - Cap per-cycle cleanup to freezerBatchLimit to prevent stalling - Bound dangling side chain chase to freezerBatchLimit iterations - Add ReadFreezerCleanupTail/WriteFreezerCleanupTail accessors and a strict variant for startup/runtime error propagation - Surface cleanup tail in ReadChainMetadata diagnostics - Add comprehensive test suite (21 tests) covering margin behavior, crash recovery, side chain cleanup, boundary conditions, corruption detection, upgrade path, and regression guard Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
91a9166 to
4e09ccd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pulls in OffchainLabs/go-ethereum#637
fixes NIT-4663
After an unclean shutdown, repair() may truncate the freezer head to
restore cross-table consistency. Previously, blocks were deleted from
the key-value store immediately after freezing, so truncated blocks
could end up missing from both stores — making the node unable to
start (especially for L2 nodes that cannot re-sync pruned blocks from
peers).
Introduce a safety margin (freezerCleanupMargin = freezerBatchLimit)
that retains the most recently frozen blocks in the key-value store.
Since freezeRange reads via nofreezedb (which bypasses the ancient
store), retained blocks can be re-frozen after repair() truncation.
Key changes:
(freezerCleanupTailKey) so progress resumes across restarts
[cleanupStart, cleanupLimit) using Has()+Get() to distinguish missing
keys from I/O errors, with backoff on failure
where the freezer has been truncated below the cleanup tail
FullImmutabilityThreshold) and fresh installs (clean from block 1)
strict variant for startup/runtime error propagation
crash recovery, side chain cleanup, boundary conditions, corruption
detection, upgrade path, and regression guard
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com