Conversation
|
I integrated this into 10.5.1 for testing:
In the Node config, the following config would result in the node creating a snapshot for the last block in every Shelly epoch (having LedgerDB:
Backend: V2InMemory # or V1LMDB
# Number of slots in a Shelley epoch, so we will create snapshots
# precisely for the last block in each Shelley epoch
SnapshotInterval: 432000
# Due to Byron, the first slots of Shelley epochs are offset by this amount
SlotOffset: 172800
# Disable the rate limit, to be sure that we definitely create snapshots
# for all epochs
RateLimit: 0Currently, this treats |
0e21795 to
4d5bb72
Compare
| } | ||
| forM_ snapshotSlots $ \slot -> do | ||
| -- Prune the 'DbChangelog' such that the resulting anchor state has slot | ||
| -- number @slot@. |
There was a problem hiding this comment.
| -- number @slot@. | |
| -- number @slot@ or younger. |
?
There was a problem hiding this comment.
Each s in snapshotSlots is the slot of a ledger state in DbChangelog (see the contract of onDiskSnapshotSelector), so it is true as written. (Otherwise, we wouldn't take a snapshot for the requested slot, so snapshots wouldn't be predictable.)
a8fa7e2 to
b503dc3
Compare
4d5bb72 to
6f063b3
Compare
b503dc3 to
6c78fad
Compare
6f063b3 to
dab2fbf
Compare
6c78fad to
48bb1fe
Compare
dab2fbf to
70fcdc3
Compare
48bb1fe to
ad7acfa
Compare
70fcdc3 to
2ecc9b9
Compare
ad7acfa to
d791dfb
Compare
2ecc9b9 to
9a0585f
Compare
d791dfb to
247c489
Compare
9a0585f to
5fdc096
Compare
247c489 to
08bda65
Compare
5fdc096 to
2e8e7be
Compare
08bda65 to
dec284f
Compare
2e8e7be to
e542170
Compare
| -- of 'sfaInterval'. | ||
| data SnapshotFrequencyArgs = SnapshotFrequencyArgs | ||
| { sfaInterval :: OverrideOrDefault SlotNo | ||
| -- ^ Try to write snapshots every 'sfaInterval' many slots. Must be positive. |
There was a problem hiding this comment.
Isn't it necessarily positive by using a SlotNo?
There was a problem hiding this comment.
It is necessarily non-negative, but not necessarily positive 😅
I changed this to NonZero Word64 (as SlotNos are usually used to signify absolute slot numbers, not distances between slots; we do the same in the HFC).
| let immutableStates = | ||
| AS.dropNewest (fromIntegral (envMaxRollbacks env)) $ changelogStates chlog | ||
| immutableSlots :: [SlotNo] = | ||
| nubOrd . mapMaybe (withOriginToMaybe . getTipSlot) $ |
There was a problem hiding this comment.
This nubOrd I imagine is there only because of EBBs, right?
There was a problem hiding this comment.
Exactly, added a comment
| atomically $ modifyTVar (ldbChangelog env) (prune pruneStrat) | ||
| -- Flush the LedgerDB such that we can take a snapshot for the new anchor | ||
| -- state due to the previous prune. | ||
| withWriteLock | ||
| (ldbLock env) | ||
| (flushLedgerDB (ldbChangelog env) (ldbBackingStore env)) |
There was a problem hiding this comment.
I'm confused. I seem to remeber the flow was the opposite, first flush then prune, no?
EDIT: Ah now I think pruning just affects the changelog states and not the diffs.
There was a problem hiding this comment.
Ah now I think pruning just affects the changelog states and not the diffs.
Exactly 👍
| (configCodec . getExtLedgerCfg . ledgerDbCfg $ ldbCfg env) | ||
| (LedgerDBSnapshotEvent >$< ldbTracer env) | ||
| (ldbHasFS env) | ||
| (anchorHandle $ snd $ prune pruneStrat lseq) |
There was a problem hiding this comment.
Here we prune the lseq that we provide to the function that takes the snapshot, but we do not modify the ldbSeq. However from what I understood about V1 above, there we do prune the dbChangelog in the environment. Why this discrepancy?
There was a problem hiding this comment.
Above we do:
atomically $ modifyTVar (ldbChangelog env) (prune pruneStrat)
There was a problem hiding this comment.
Good call pointing that out. Morally, I think the V2 semantics are preferrable: taking a snapshot should not modify the stored states. However, with V1, this is not possible: We can only perform a snapshot for the last flushed state, so we have no choice there.
Note that there already are some differences between V1 and V2 before this PR due to this: With V1, we create snapshots for the last flushed state, whereas with V2, we create snapshots for the immutable tip.
Given that V1 is going to go away "soon", I am inclined to not worry about this too much, but maybe that is too optimistic.
6258cef to
74d1d62
Compare
096896d to
7be6ea5
Compare
7be6ea5 to
f6f2731
Compare
| now <- getMonotonicTime | ||
| pure $ now `diffTime` lastWrite | ||
| RAWLock.withReadAccess (ldbOpenHandlesLock env) $ \() -> do | ||
| implTryTakeSnapshot snapManager env snapshotRequestTime getRandomDelay = do |
There was a problem hiding this comment.
Calculate snapshotRequestTime inside this function using now, as it was before.
826f567 to
d181bb5
Compare
Superseded by the rework of the snapshot policy for predictable snapshots, with dedicated new tests LedgerDB: implement predictable snapshotting
It is no longer needed by the predictable snapshotting logic.
- tryTakeSnapshot: now accepts a `Time` argument. The argument specifies the time at which the snapshot should be taken - LedgerDBEnv: rename ldbLastSnapshotWrite to ldbLastSnapshotRequestedAt. Track the request time rather than the time a snapshot actually finished. - implTryTakeSnapshot: add a `delay` argument. How long should we block before actually taking the snapshot after determining the slots to snapshot - add `cdbSnapshotDelayRNG` and use this to determine how long we should wait before taking a snapshot - add orphan `NoThunks` `StdGen` instance - add `cdbsSnapshotDelayRNG to `ChainDbSpecificArgs` - add `onDiskSnapshotDelayRange` to `SnapshotPolicy` to allow configuration of the delay between a snapshot being requested and being taken - add LedgerDB snapshot delay trace events. Use these events in the test suite to ensure we don't add blocks while snapshots are occurring (and therefore make an accurate number of snapshots). - add a test ensuring that blocks can be added while a snapshot is enqueued - ledgerDbMaintenaceThread -> ledgerDbMaintenanceThread
This commit brings back the fix from #1814, which synchronises the process of taking a ledger state snapshot and copying of blocks into the immutable DB.
The value of this variable should be the time when the snapshot was requested, not finished taking.
d181bb5 to
6c2bc63
Compare
Closes #1424, #1573
Based on top of #1513, see there for the relation to this change.
Note that we need #1573 before this can be released in a node.
This PR is intended to be reviewed commit-by-commit.
This PR replaces the previous logic for when to create snapshots (It would be possible to preserve it, but I don't see a big motivation). Concretely,
SnapshotFrequencyArgscontainssfaInterval :: SlotNo: Create snapshots everysfaIntervalmany slots.Default:
2*k = 4320slots, so 72 min on mainnet as before.sfaOffset :: SlotNo: Allows to determine the offset of where snapshots are taken, see below.Default:
0sfaRateLimit :: DiffTime: A minimum duration between snapshots (used to avoid excessive snapshots while syncing).Default: 10 minutes (previous value was 6 minutes, which seemed a bit low, so I increased it somewhat. Maybe it should be increased even more now that we no longer have the
substantialAmountOfBlocksWereProcessedcheck.)Concretely, the node will try to create snapshots for the last immutable blocks before the slots
but can skip creating some of these depending on the
sfaRateLimit(which can be disabled by setting it to a non-positive value). Also see the Haddocks.For example, setting
sfaInterval = 10*2160*20(one mainnet Shelley epoch) andsfaOffset = 172800will cause the node to create snapshots for the last block in every Shelley epoch (because the first Shelley slot is4492800, and4492800 `mod` (10*2160*20) = 172800. By tweakingsfaOffset, one can take snapshots eg right before the midway point in each epoch.There is some code that could be shared between V1 and V2 (already even before this PR), but given the upcoming removal of V1, this seems acceptable for now.
Also includes a test running a ChainDB in
IOSimto test that everything is hooked up correctly (in particular regarding the background threads).See #1513 (comment) for sync regression tests.