Skip to content

Adding multi-cluster support via array of cluster configs based on topology#6116

Open
dridge93 wants to merge 1 commit intoceph:develfrom
dridge93:multi-cluster-support
Open

Adding multi-cluster support via array of cluster configs based on topology#6116
dridge93 wants to merge 1 commit intoceph:develfrom
dridge93:multi-cluster-support

Conversation

@dridge93
Copy link
Copy Markdown

Adding multi-cluster support via array of cluster configs based on topology

Describe what this PR does

Adds multi-cluster support to Ceph CSI RBD (#5177)

Related issues

Fixes: #5177


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@dridge93 dridge93 force-pushed the multi-cluster-support branch 2 times, most recently from ef3f0c6 to 035176f Compare February 25, 2026 20:55
@dridge93
Copy link
Copy Markdown
Author

Has anyone had a chance to review this PR?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2026

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Apr 9, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 9, 2026

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

@github-actions github-actions Bot removed the stale label Apr 10, 2026
…pology Signed-off-by: Devin Ridge <dridge@globalnoc.iu.edu>

Signed-off-by: Devin Ridge <dridge@globalnoc.iu.edu>
@dridge93 dridge93 force-pushed the multi-cluster-support branch from 035176f to 933bbff Compare April 24, 2026 14:44
@akaitux
Copy link
Copy Markdown

akaitux commented Apr 28, 2026

@dridge93

I found what looks like a functional bug in the original multi-AZ / multi-cluster topology patch
(933bbff4a9a4747dc593ff7df577dfed95a4884c).

The new helper already returns topology requirements for the clusterTopologyConfigMap path:

func GetClusterTopologiesFromRequest(
    req *csi.CreateVolumeRequest,
) (*[]ClusterTopology, *csi.TopologyRequirement, error)

and inside it:

accessibilityRequirements := req.GetAccessibilityRequirements()
...
return &clusterTopologies, accessibilityRequirements, nil

But in parseVolCreateRequest() that return value is discarded:

rbdVol.TopologyPools, rbdVol.TopologyRequirement, err = util.GetTopologyFromRequest(req)
...
rbdVol.ClusterTopologies, _, err = util.GetClusterTopologiesFromRequest(req)

So when the StorageClass uses only:

parameters:
  clusterTopologyConfigMap: ceph-csi-rbd-cluster-topology

and does not use topologyConstrainedPools, then:

  - rbdVol.ClusterTopologies is populated
  - rbdVol.TopologyRequirement stays nil

Later both CreateVolume() and updateTopologyConstraints() do:

selectedCluster, _, err = util.FindClusterAndTopology(
    rbdVol.ClusterTopologies,
    rbdVol.TopologyRequirement,
)

or:

cluster, topology, err := util.FindClusterAndTopology(
    rbdVol.ClusterTopologies,
    rbdVol.TopologyRequirement,
)

but FindClusterAndTopology() starts with:

if clusterTopologies == nil || accessibilityRequirements == nil {
    return ClusterTopology{}, nil, nil
}

So the cluster selection silently returns empty, and CreateVolume fails with:

no matching cluster found for provided topology requirements

even though the incoming request already contains valid topology like:

"accessibility_requirements": {
  "preferred": [
    {"segments":{"topology.rbd.csi.ceph.com/zone":"az01"}},
    {"segments":{"topology.rbd.csi.ceph.com/zone":"az02"}}
  ]
}

and the live clusterTopologyConfigMap contains matching zones:

{
  "clusterTopology": [
    {
      "clusterID": "...",
      "zones": ["az01"],
      ...
    },
    {
      "clusterID": "...",
      "zones": ["az02"],
      ...
    }
  ]
}

The minimal fix seems to be to preserve the topology requirement returned by GetClusterTopologiesFromRequest() when
rbdVol.TopologyRequirement is still nil.

diff --git a/internal/rbd/controllerserver.go b/internal/rbd/controllerserver.go
index 6ef562373..dd4a3a148 100644
--- a/internal/rbd/controllerserver.go
+++ b/internal/rbd/controllerserver.go
@@ -237,10 +237,14 @@ func (cs *ControllerServer) parseVolCreateRequest(
 	}

 	// store cluster topology information from the request if present
-	rbdVol.ClusterTopologies, _, err = util.GetClusterTopologiesFromRequest(req)
+	var clusterTopologyRequirement *csi.TopologyRequirement
+	rbdVol.ClusterTopologies, clusterTopologyRequirement, err = util.GetClusterTopologiesFromRequest(req)
 	if err != nil {
 		return nil, status.Error(codes.InvalidArgument, err.Error())
 	}
+	if rbdVol.TopologyRequirement == nil {
+		rbdVol.TopologyRequirement = clusterTopologyRequirement
+	}

 	// parse QOS parameters from mutable parameters
 	err = rbdVol.SetQOS(ctx, req.GetMutableParameters())

@akaitux
Copy link
Copy Markdown

akaitux commented Apr 28, 2026

@dridge93
Second, there is another gap in the original multi-cluster topology flow: the selected cluster secret is only used on the controller/
provisioning path, but not on the node-stage path.

In CreateVolume(), when req.GetSecrets() is empty, the implementation already falls back to the cluster-specific secret
selected from clusterTopologyConfigMap:

selectedCluster, _, err = util.FindClusterAndTopology(...)
...
rbdVol.ClusterSecretName = selectedCluster.SecretName
...
if len(secrets) == 0 && selectedCluster.SecretName != "" {
    secrets, err = k8s.GetSecret(selectedCluster.SecretName, namespace)
}

and then clusterSecretName is persisted into VolumeContext:

  if rbdVol.ClusterSecretName != "" {
      vol.VolumeContext["clusterSecretName"] = rbdVol.ClusterSecretName
  }

So the controller side already knows which secret belongs to the selected cluster.

However, on the node path, NodeStageVolume() still requires standard CSI node-stage secrets to be present in the request:

  if req.GetSecrets() == nil || len(req.GetSecrets()) == 0 {
      return status.Error(codes.InvalidArgument, "stage secrets cannot be nil or empty")
  }

and then immediately does:

  cr, err := util.NewUserCredentialsWithMigration(req.GetSecrets())

This means that a StorageClass using only:

  parameters:
    clusterTopologyConfigMap: ceph-csi-rbd-cluster-topology

still fails at node stage unless csi.storage.k8s.io/node-stage-secret-* is also set statically.

That is problematic for the multi-cluster mode, because:

  • cluster selection is dynamic (az01 / az02 / az03)
  • but node-stage-secret-* in the StorageClass is static
  • so a single StorageClass cannot correctly represent per-cluster credentials unless the same credentials are valid
    everywhere

The minimal fix is:

  1. relax ValidateNodeStageVolumeRequest() so empty req.Secrets is allowed when req.VolumeContext["clusterSecretName"] is
    present
  2. in NodeStageVolume(), if req.GetSecrets() is empty, load the secret from clusterSecretName in the driver namespace and use
    it to build credentials

Example fix:

diff --git a/internal/util/validate.go b/internal/util/validate.go
@@
- if req.GetSecrets() == nil || len(req.GetSecrets()) == 0 {
-     return status.Error(codes.InvalidArgument, "stage secrets cannot be nil or empty")
- }
+ if req.GetSecrets() == nil || len(req.GetSecrets()) == 0 {
+     if req.GetVolumeContext()["clusterSecretName"] == "" {
+         return status.Error(codes.InvalidArgument, "stage secrets cannot be nil or empty")
+     }
+ }

diff --git a/internal/rbd/nodeserver.go b/internal/rbd/nodeserver.go
@@
- cr, err := util.NewUserCredentialsWithMigration(req.GetSecrets())
+ secrets := req.GetSecrets()
+ if len(secrets) == 0 {
+     clusterSecretName := req.GetVolumeContext()["clusterSecretName"]
+     if clusterSecretName != "" {
+         namespace, nsErr := util.GetPodNamespace()
+         if nsErr != nil {
+             return nil, status.Error(codes.InvalidArgument, nsErr.Error())
+         }
+         secrets, err = k8s.GetSecret(clusterSecretName, namespace)
+         if err != nil {
+             return nil, status.Error(codes.InvalidArgument, err.Error())
+         }
+     }
+ }
+
+ cr, err := util.NewUserCredentialsWithMigration(secrets)

Without this, the multi-cluster topology flow is only partially implemented:

  • controller/provisioning can use per-cluster secrets
  • node-stage still requires a static StorageClass secret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: Multi-Ceph Cluster Support for Topology-Aware Volume Provisioning with ConfigMap-Based Management

2 participants