As the performance impact of a snapshot is still non-trivial (albeit improved recently), cf #868, it is desirable to avoid scenarios where all nodes in the system perform a snapshot at the same time.
In the current snapshotting scheme, this is achieved implicitly due to the ad-hoc policy used for when to snapshot, which depends on the startup time, and so when nodes create snapshots is usually largely uncorrelated.
With predictable snapshotting (#1424), this will change, so a simple mitigation is to introduce a random delay before we do the snapshot (while still snapshotting for a particular slot). A random delay of eg 5-10 minutes sounds fine.
This mechanism is only needed when caught-up; when syncing, it would increase peak memory usage if we had to keep around ledger states from 10 minutes ago (as we might have adopted hundreds of thousands of blocks in the meantime).
In the long term (with lsm-tree and LedgerHD), snapshotting should become very cheap, so this functionality can then be removed again.
As the performance impact of a snapshot is still non-trivial (albeit improved recently), cf #868, it is desirable to avoid scenarios where all nodes in the system perform a snapshot at the same time.
In the current snapshotting scheme, this is achieved implicitly due to the ad-hoc policy used for when to snapshot, which depends on the startup time, and so when nodes create snapshots is usually largely uncorrelated.
With predictable snapshotting (#1424), this will change, so a simple mitigation is to introduce a random delay before we do the snapshot (while still snapshotting for a particular slot). A random delay of eg 5-10 minutes sounds fine.
This mechanism is only needed when caught-up; when syncing, it would increase peak memory usage if we had to keep around ledger states from 10 minutes ago (as we might have adopted hundreds of thousands of blocks in the meantime).
In the long term (with lsm-tree and LedgerHD), snapshotting should become very cheap, so this functionality can then be removed again.