-
Notifications
You must be signed in to change notification settings - Fork 444
Description
LDK's current persistence model requires the ChannelManager and each ChannelMonitor to be persisted independently, at different times, by different callers. This creates a fundamental consistency problem: on restart, the ChannelManager and ChannelMonitor states may not agree, which is the root cause of unnecessary force closes after crashes or unclean shutdowns. To mitigate this, a significant amount of reconciliation logic runs on startup to detect and resolve inconsistencies, adding complexity and still not covering all edge cases.
The lack of atomic persistence also forces the existence of channel freezing via ChannelMonitorUpdateStatus::InProgress, where channels are paused while persistence catches up. This machinery is difficult to reason about and still has edge cases.
Proposed approach
Instead of persisting the ChannelManager and ChannelMonitors as independent operations, persist them together in a single atomic batch through a queuing KV store layer.
1. ChannelManager persists itself at the right moments
Rather than relying on the background processor to periodically call an external persist function, the ChannelManager holds a reference to the KV store and writes its own state at exactly two chokepoints: before returning events from process_pending_events and get_and_clear_pending_msg_events. This guarantees the persisted state always matches the events the caller is about to handle. If the caller crashes mid-handling, events replay on restart.
The key insight is that state changes only need to be persisted before they become externally observable. As long as no events or messages have been handed to the caller, the system can safely restart from the last persisted state and re-derive the same changes. This is why persisting at just these two chokepoints is sufficient.
The current persistence flag mechanism triggers full re-serialization even when nothing recoverable has changed. This approach eliminates those redundant writes entirely.
2. Per-channel keys for granular ChannelManager updates
The ChannelManager currently serializes all of its channel state into a single blob. Instead, each channel's data (as stored within the ChannelManager, not the ChannelMonitor) is written to its own KV store key. Combined with change detection (comparing serialized state against the last persist), updating one channel out of thousands writes only that channel's key plus the small manager metadata, not a re-serialization of the entire ChannelManager.
3. Batched atomic commits
A QueuedKVStoreSync wrapper buffers all writes (monitor updates, manager updates) in memory. On commit(), it serializes all queued changes into a single value and writes it to the underlying KV store under a unique sequenced key. Because any KV store can write a single key atomically (e.g. FilesystemStore uses write-to-temp + rename), this guarantees that either all changes from a commit are persisted or none are. There is never a window where one is persisted without the other, eliminating the force close problem and the need for startup reconciliation.
Multiple monitor updates that occur between two chokepoints are naturally batched into a single write operation instead of hitting disk individually. This benefits both individual forwards (which touch two channels) and busy nodes where many unrelated channel updates accumulate between event processing cycles. This means fewer fsyncs, which is often the dominant cost in persistence-heavy workloads.
On startup, QueuedKVStoreSync reads the base snapshot plus any unconsolidated delta keys (ordered by sequence number) and replays them to reconstruct the current state. Reads during normal operation check the in-memory queue first and fall back to the inner store, so callers always see the latest buffered state.
4. Async monitor persistence can be removed
The InProgress variant of ChannelMonitorUpdateStatus was originally added for performance on high-latency storage backends, where blocking on each individual monitor write would be too slow. With batched writes, all monitor updates are queued in memory (no I/O) and flushed as a single write on commit. Since there is only one write operation, the performance concern that motivated per-monitor async persistence no longer applies. The single commit can still happen asynchronously as long as we hold off on sending messages and handling events until it completes. This eliminates the channel freezing machinery and the associated edge cases around fund loss.
5. Background consolidation of deltas
Each commit writes a small delta containing only the changes since the last commit. Over time these deltas accumulate. A background thread can read all outstanding deltas, integrate them into a new full state snapshot, and then remove the consumed deltas. This takes over the role of the current consolidation mechanism in MonitorUpdatingPersister (which replays individual monitor updates into full monitor snapshots), and extends it to also consolidate partial ChannelManager saves where only changed channels were persisted. Consolidation is purely an optimization that does not affect correctness; the system works fine with any number of unconsolidated deltas, it just means startup takes longer as more keys need to be read and replayed. The consolidation thread has no interaction with the hot path and can run at whatever pace the storage backend allows.
6. Extensible to application state
Higher-level application state (e.g. a payment store in ldk-node) can piggyback on the same atomic commit, keeping application data consistent with LDK's internal state without additional coordination.
Trade-offs
Serializing all updates into a single system delta means that channels which could in theory operate completely independently (e.g. two unrelated forwards touching four different channels) are funneled through one write. At very large scale, this could become a bottleneck compared to a model where each channel or channel pair persists independently. In practice, we are likely far from the point where this matters, and there are other scaling paths such as running multiple nodes (though that is less capital efficient). The simplicity gained by having a single, consistent persistence model is worth a lot: it eliminates entire classes of bugs and makes the system much easier to reason about.
Proof of concept
An early proof of concept is available at https://github.com/joostjager/rust-lightning/pull/new/one-shot-persist. It is incomplete and not production-ready, but demonstrates the core ideas described above.