Add cluster-persist-config option to control nodes.conf save behavior#3372
Open
enjoy-binbin wants to merge 2 commits intovalkey-io:unstablefrom
Open
Add cluster-persist-config option to control nodes.conf save behavior#3372enjoy-binbin wants to merge 2 commits intovalkey-io:unstablefrom
enjoy-binbin wants to merge 2 commits intovalkey-io:unstablefrom
Conversation
This commit introduces a new configuration option `cluster-persist-config` that controls how the cluster handles nodes.conf file save failures. The option supports two modes: - `sync` (default): Synchronously save the config file. If the save fails, the process exits. This maintains backward compatibility with the old behavior (before 9.1). - `best-effort`: Synchronously save the config file. If the save fails, only log a warning and continue running. This allows the node to survive disk failures (e.g., disk full, read-only filesystem) without exitting, giving administrators time to address the issue. Note that this modifies the behavior of valkey-io#1032, whereas valkey-io#1032 was "best-effort", we have now introduced a configuration option that defaults to "sync." See valkey-io#1032 discussion for more details. Background: When a disk becomes read-only or full, any cluster metadata change would trigger a nodes.conf save attempt. With the old behavior, the node would immediately exit via clusterSaveConfigOrDie(), potentially causing multiple nodes on the same machine to crash simultaneously, leading to cluster unavailability. The new `best-effort` mode addresses this by allowing nodes to continue operating even when disk writes fail. This is particularly useful in cloud environments where disk failures are more common due to scale. Note: Startup-time config saves (in clusterInit and verifyClusterConfigWithData) still use clusterSaveConfigOrDie() since disk issues at startup should cause immediate failure. Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3372 +/- ##
============================================
- Coverage 74.55% 74.50% -0.05%
============================================
Files 130 130
Lines 72731 72734 +3
============================================
- Hits 54225 54192 -33
- Misses 18506 18542 +36
🚀 New features to boost your workflow:
|
Contributor
|
When we merge #2555 we can change 'best-effort' to be in bio? |
murphyjacob4
approved these changes
Mar 17, 2026
Contributor
murphyjacob4
left a comment
There was a problem hiding this comment.
Makes sense to me
We had quorum in Monday's meeting and got the verbal consensus. FYI @valkey-io/core-team in case anyone wasn't there and wants to chime in
hpatro
reviewed
Mar 17, 2026
Contributor
hpatro
left a comment
There was a problem hiding this comment.
Shall we name it as cluster-config-save-behavior ?
Member
Author
Yes, we can, it is best-effort.
Sound like a better (a more specific one) name. I am open to it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces a new configuration option
cluster-persist-configthat controls how the cluster handles nodes.conf file save failures.
The option supports two modes:
sync(default): Synchronously save the config file. If the save fails,the process exits. This maintains backward compatibility with the old
behavior (before 9.1).
best-effort: Synchronously save the config file. If the save fails,only log a warning and continue running. This allows the node to survive
disk failures (e.g., disk full, read-only filesystem) without exitting,
giving administrators time to address the issue.
Note that this modifies the behavior of #1032, whereas #1032 was "best-effort",
we have now introduced a configuration option that defaults to "sync."
See #1032 discussion for more details.
Background:
When a disk becomes read-only or full, any cluster metadata change would
trigger a nodes.conf save attempt. With the old behavior, the node would
immediately exit via clusterSaveConfigOrDie(), potentially causing multiple
nodes on the same machine to crash simultaneously, leading to cluster
unavailability.
The new
best-effortmode addresses this by allowing nodes to continueoperating even when disk writes fail. This is particularly useful in cloud
environments where disk failures are more common due to scale.
Note: Startup-time config saves (in clusterInit and verifyClusterConfigWithData)
still use clusterSaveConfigOrDie() since disk issues at startup should cause
immediate failure.