Add cluster-persist-config option to control nodes.conf save behavior by enjoy-binbin · Pull Request #3372 · valkey-io/valkey

enjoy-binbin · 2026-03-17T13:55:43Z

This commit introduces a new configuration option cluster-persist-config
that controls how the cluster handles nodes.conf file save failures.

The option supports two modes:

sync (default): Synchronously save the config file. If the save fails,
the process exits. This maintains backward compatibility with the old
behavior (before 9.1).
best-effort: Synchronously save the config file. If the save fails,
only log a warning and continue running. This allows the node to survive
disk failures (e.g., disk full, read-only filesystem) without exitting,
giving administrators time to address the issue.

Note that this modifies the behavior of #1032, whereas #1032 was "best-effort",
we have now introduced a configuration option that defaults to "sync."
See #1032 discussion for more details.

Background:
When a disk becomes read-only or full, any cluster metadata change would
trigger a nodes.conf save attempt. With the old behavior, the node would
immediately exit via clusterSaveConfigOrDie(), potentially causing multiple
nodes on the same machine to crash simultaneously, leading to cluster
unavailability.

The new best-effort mode addresses this by allowing nodes to continue
operating even when disk writes fail. This is particularly useful in cloud
environments where disk failures are more common due to scale.

Note: Startup-time config saves (in clusterInit and verifyClusterConfigWithData)
still use clusterSaveConfigOrDie() since disk issues at startup should cause
immediate failure.

This commit introduces a new configuration option `cluster-persist-config` that controls how the cluster handles nodes.conf file save failures. The option supports two modes: - `sync` (default): Synchronously save the config file. If the save fails, the process exits. This maintains backward compatibility with the old behavior (before 9.1). - `best-effort`: Synchronously save the config file. If the save fails, only log a warning and continue running. This allows the node to survive disk failures (e.g., disk full, read-only filesystem) without exitting, giving administrators time to address the issue. Note that this modifies the behavior of valkey-io#1032, whereas valkey-io#1032 was "best-effort", we have now introduced a configuration option that defaults to "sync." See valkey-io#1032 discussion for more details. Background: When a disk becomes read-only or full, any cluster metadata change would trigger a nodes.conf save attempt. With the old behavior, the node would immediately exit via clusterSaveConfigOrDie(), potentially causing multiple nodes on the same machine to crash simultaneously, leading to cluster unavailability. The new `best-effort` mode addresses this by allowing nodes to continue operating even when disk writes fail. This is particularly useful in cloud environments where disk failures are more common due to scale. Note: Startup-time config saves (in clusterInit and verifyClusterConfigWithData) still use clusterSaveConfigOrDie() since disk issues at startup should cause immediate failure. Signed-off-by: Binbin <[email protected]>

Signed-off-by: Binbin <[email protected]>

codecov · 2026-03-17T16:39:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.50%. Comparing base (6c329df) to head (b7f97d5).
⚠️ Report is 3 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3372      +/-   ##
============================================
- Coverage     74.55%   74.50%   -0.05%     
============================================
  Files           130      130              
  Lines         72731    72734       +3     
============================================
- Hits          54225    54192      -33     
- Misses        18506    18542      +36

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`88.16% <100.00%> (-0.16%)`	⬇️
src/config.c	`77.70% <ø> (ø)`
src/server.h	`100.00% <ø> (ø)`

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast · 2026-03-17T20:12:13Z

When we merge #2555 we can change 'best-effort' to be in bio?

murphyjacob4

Makes sense to me

We had quorum in Monday's meeting and got the verbal consensus. FYI @valkey-io/core-team in case anyone wasn't there and wants to chime in

hpatro

Shall we name it as cluster-config-save-behavior ?

enjoy-binbin · 2026-03-18T07:39:54Z

When we merge #2555 we can change 'best-effort' to be in bio?

Yes, we can, it is best-effort.

Shall we name it as cluster-config-save-behavior ?

Sound like a better (a more specific one) name. I am open to it.

github-actions bot assigned enjoy-binbin Mar 17, 2026

enjoy-binbin added the major-decision-approved Major decision approved by TSC team label Mar 17, 2026

enjoy-binbin requested review from PingXie, madolson and zuiderkwast March 17, 2026 13:56

enjoy-binbin added this to Valkey 9.1 Mar 17, 2026

fix and add test

b7f97d5

Signed-off-by: Binbin <[email protected]>

enjoy-binbin mentioned this pull request Mar 17, 2026

Fail to save the cluster config file will not exit the process #1032

Merged

enjoy-binbin moved this to Optional for next release candidate in Valkey 9.1 Mar 17, 2026

murphyjacob4 approved these changes Mar 17, 2026

View reviewed changes

hpatro reviewed Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster-persist-config option to control nodes.conf save behavior#3372

Add cluster-persist-config option to control nodes.conf save behavior#3372
enjoy-binbin wants to merge 2 commits intovalkey-io:unstablefrom
enjoy-binbin:cluster_config_save

enjoy-binbin commented Mar 17, 2026

Uh oh!

codecov bot commented Mar 17, 2026

Uh oh!

zuiderkwast commented Mar 17, 2026

Uh oh!

murphyjacob4 left a comment

Uh oh!

hpatro left a comment

Uh oh!

enjoy-binbin commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

enjoy-binbin commented Mar 17, 2026

Uh oh!

codecov bot commented Mar 17, 2026

Codecov Report

Uh oh!

zuiderkwast commented Mar 17, 2026

Uh oh!

murphyjacob4 left a comment

Choose a reason for hiding this comment

Uh oh!

hpatro left a comment

Choose a reason for hiding this comment

Uh oh!

enjoy-binbin commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants