Skip to content

OBC Controller and Reconcile#520

Open
shirady wants to merge 27 commits intored-hat-storage:mainfrom
shirady:obc-controller-implementation
Open

OBC Controller and Reconcile#520
shirady wants to merge 27 commits intored-hat-storage:mainfrom
shirady:obc-controller-implementation

Conversation

@shirady
Copy link
Contributor

@shirady shirady commented Feb 25, 2026

Describe the problem

Part of RHSTOR-6230
Continue of PRs red-hat-storage/ocs-operator#3692 by calling the Notify when an OBC is created and deleted.

Explain the changes

  1. Register the OBC and OBC types.
    Note: to avoid direct import of lib-bucket-provision, I had to add them manually.
    In case we decide to import it directly, it can be in the same style as others.
  2. OBC can be created in all namespaces, so add the needed changes for it.
  3. Create the OBC controller, which is running on the client cluster (that doesn't have NooBaa installed and the OBC controller from lib-bucket-provisionser), the controller with predicate so the reconcile loop will be triggered on specific changes only: creation, deletion, and update only by setting the deletion timestamp.
  4. In creation - add a finalizer and call Notify.
    Notes:
  • On failures, change the status to Failed.
  • Decided at this point not to change to Pending.
  • To identify the storage client (as there can be a couple of storage clients in the client cluster), we use the ownerReferences from the storageclass.
  • To be able to list the OBCs by storage client - added a label with that information on the OBC (plan to use it as part of preventing deletion of storage client in case it has OBCs).
  1. In deletion call Notify and remove finalizer.
  2. Adding permission to the controller for the above actions.
  3. Changes to RBAC after running make manifests and make bundle.

Testing Instructions:

Manual Test:

  1. Set 2 clusters - provider and client, after onboarding. The client cluster with this PR code changes, and the provider cluster with Implement Notify - OBC Created and Deleted ocs-operator#3692 code changes (Notify implementation).
  2. On the client cluster, install the OBC CRD manually from noobaa-operator repo: kubectl apply -f ./noobaa-operator/deploy/obc/objectbucket.io_objectbucketclaims_crd.yaml
  3. Set the permission manually:
kubectl create clusterrole ocs-client-operator-obc-role \
  --verb=get,list,watch,patch,update \
  --resource=objectbucketclaims.objectbucket.io

# Add the status subresource rule
kubectl patch clusterrole ocs-client-operator-obc-role --type='json' -p='[
  {"op": "add", "path": "/rules/-", "value": {
    "apiGroups": ["objectbucket.io"],
    "resources": ["objectbucketclaims/status"],
    "verbs": ["get", "patch", "update"]
  }}
]'

kubectl create clusterrolebinding ocs-client-operator-obc-rolebinding \
  --clusterrole=ocs-client-operator-obc-role \
  --serviceaccount=openshift-storage-client:ocs-client-operator-controller-manager
  1. Add a storage cluster with ownerReferences of the storage client (you can copy from another storage class on the cluster) manually with a temp finalizer (else it is deleted).
    Example of storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openshift-storage.noobaa.io
  finalizers:
  - ocs-client-operator.ocs.openshift.io/temp
  ownerReferences:
  - apiVersion: ocs.openshift.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: StorageClient
    name: storage-client-2202
    uid: 841f7ac8-17cb-4d34-8ee2-7c6ebaf82103
parameters:
  bucketclass: noobaa-default-bucket-class
provisioner: openshift-storage.noobaa.io/obc
reclaimPolicy: Delete
volumeBindingMode: Immediate
  1. Creation:
  • Client cluster: create OBC on client cluster - should be created without status change.
    Example of OBC:
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: shira-obc-2302
  namespace: openshift-storage-client
spec:
  storageClassName: openshift-storage.noobaa.io
  generateBucketName: shira-obc-2302
  additionalConfig: {}

Note: OBC can be created in any namespace. If you want to test on another namespace - create the namespace (kubectl create ns <namespacename>) and change the namespace value in the yaml.

  • Provider cluster: check that the OBC on the provider cluster is created and in Bound phase.
  1. Deletion:
    Since we do not have the copy mechanism implemented yet, we will test only the OBC deleted (without additional resources):
  • Manual steps:
    a. Install the OB CRD (kubectl apply -f noobaa-operator/deploy/obc/objectbucket.io_objectbuckets_crd.yaml)
    b. Change the clusterRole (use Ai generated file from the kubebuilder marks).
  • Client cluster: delete OBC on client cluster - should be deleted.
  • Provider cluster: check that the OBC was deleted (with all related resources).

@shirady shirady marked this pull request as draft February 25, 2026 13:58
@openshift-ci
Copy link

openshift-ci bot commented Feb 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shirady
Once this PR has been reviewed and has the lgtm label, please assign nb-ohad for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@leelavg
Copy link
Contributor

leelavg commented Feb 26, 2026

install the OBC CRD manually from noobaa-operator repo

  • unless we finalize on how we are bringing these CRDs we can't have this PR in current format, can there be a dynamic caching mechanism for this controller, like, when the PR is merged and client-op is deployed standalone even without OBC CRD client-op will not error out and serve existing cases?

manually with a temp finalizer (else it is deleted).

  • why? it gets deleted if the ownerref is wrong or the server doesn't send this resource? I suppose it's the latter and may need a fix, good find.

Copy link
Contributor

@leelavg leelavg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't review the controller yet as pr is in draft.

Comment on lines +235 to +237
&nbv1.ObjectBucketClaim{}: {
Namespaces: map[string]cache.Config{corev1.NamespaceAll: {}},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even the returned OB/Secret/Configmap should be watched in other namespaces, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The secret and config map should be watched in other namespaces as well, yes -I added the change.
OB is cluster-scoped, so I didn't change.

@shirady shirady force-pushed the obc-controller-implementation branch from a8fa7f2 to d7c518a Compare February 26, 2026 11:59
@shirady
Copy link
Contributor Author

shirady commented Feb 26, 2026

install the OBC CRD manually from noobaa-operator repo

  • unless we finalize on how we are bringing these CRDs we can't have this PR in current format, can there be a dynamic caching mechanism for this controller, like, when the PR is merged and client-op is deployed standalone even without OBC CRD client-op will not error out and serve existing cases?

From what I understand from @nb-ohad for testing we can install the CRD that we need.
If you think that we need to add something specific for the general case, we can add.
The qoute was taken from the manual testing that was done.

manually with a temp finalizer (else it is deleted).

  • why? it gets deleted if the ownerref is wrong or the server doesn't send this resource? I suppose it's the latter and may need a fix, good find.

Yes, you are correct. We would need to make changes. As we didn't want to be blocked at this point, it was suggested that we add the finalizer for testing purposes.

@shirady shirady force-pushed the obc-controller-implementation branch from 159612a to 622dfff Compare March 4, 2026 08:25
@shirady shirady force-pushed the obc-controller-implementation branch 3 times, most recently from e1d0de3 to 2f407e5 Compare March 4, 2026 09:05
@shirady shirady marked this pull request as ready for review March 4, 2026 10:37
@shirady
Copy link
Contributor Author

shirady commented Mar 4, 2026

@leelavg @rewantsoni, Please take a look

@shirady shirady force-pushed the obc-controller-implementation branch from 2f407e5 to d45478a Compare March 4, 2026 16:06
@shirady shirady force-pushed the obc-controller-implementation branch from d45478a to 5d62d88 Compare March 5, 2026 07:31
@shirady shirady changed the title RHSTOR-6230 | OBC Controller and Reconcile OBC Controller and Reconcile Mar 8, 2026
Copy link
Member

@rewantsoni rewantsoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good expect one comment regarding adding the storageClient label to the OBC

Comment on lines +28 to +29
ObcFinalizer = nbv1.ObjectBucketFinalizer
ObjectBucketClaimStatusPhaseFailed = "Failed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a copy of the Phase and Finalizer, can we not use it from the pkg directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use it directly, thanks.
For the status, currently I don't have it, but I raised a PR to have it -
noobaa/noobaa-operator#1823

Comment on lines +151 to +153
if r.obc.GetDeletionTimestamp().IsZero() && r.obc.Status.Phase == "" {
r.obc.Status.Phase = ObjectBucketClaimStatusPhaseFailed
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if r.obc.GetDeletionTimestamp().IsZero() && r.obc.Status.Phase == "" {
r.obc.Status.Phase = ObjectBucketClaimStatusPhaseFailed
}
if r.obc.Status.Phase == "" {
r.obc.Status.Phase = ObjectBucketClaimStatusPhaseFailed
}

We have a check for deletionTimestamp above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks

Comment on lines +159 to +161
if r.obc.Status.Phase == ObjectBucketClaimStatusPhaseFailed {
r.obc.Status.Phase = ""
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this cause a deadlock? I am thinking of a scenario, if for some reason the OBC on the provider reached a failed state, we update the Status of OBC from the storage_client controller to a failed state, and then the OBC controller updates the Phase to "".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I checked in the code, there no place where the status phase is Failed.
I added the status change since it was raised as a comment in the design to reflect it.
To be on the safe side I will not change the phase, and the way to debug it would be to check the logs and see the error.

shirady added 5 commits March 12, 2026 15:45
and remove redandunt check of deletion timestamp

Signed-off-by: shirady <[email protected]>
Signed-off-by: shirady <[email protected]>
Signed-off-by: shirady <[email protected]>
@shirady
Copy link
Contributor Author

shirady commented Mar 15, 2026

@nb-ohad , please take a look

Comment on lines +11 to +20
func NewProviderClientForStorageClient(
ctx context.Context,
storageProviderEndpoint string,
) (*providerClient.OCSProviderClient, error) {
ocsProviderClient, err := providerClient.NewProviderClient(ctx, storageProviderEndpoint, OcsClientTimeout)
if err != nil {
return nil, fmt.Errorf("failed to create provider client with endpoint %v: %w", storageProviderEndpoint, err)
}
return ocsProviderClient, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a meaningless util, it does nothing but call another function. And it does not even allow us to configure the error message in accordance with the given context of the call site which means that the call site now needs to wrap the underlaying error with another error. I don't get why this is being introduced.

This function also limits all call site to share the same timeout value, which seems wrong. The next developer who will need a different value for timeout will now add the timeout as an argument (which I don't blame them), but then you cannot even use the timeout as a reason for the existence of this function.

In addition, now a reader need to have a context switch when reading the call site code with no apparent value.

It seems like this function was only introduced to "skip" typing the timeout constant in multiple places (which is kind of the wrong reason for refactoring). We should not create code that limits the usability of our libraries. If we wanted a fixed timeout, or a fixed error message, we could have just hard coded it into the NewProviderClient function itself, we chose not to do so and provide the caller the flexibility. This function seems to go directly against that design decision.

@shirady Could you please make a case for this function, or just revert the change.

CC: @leelavg

Comment on lines +45 to +47
predicate.NewPredicateFuncs(func(obj client.Object) bool {
return !obj.GetDeletionTimestamp().IsZero()
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate means that we will not handle "deleting". In that case who should be handling the deleting phase? shouldn't this controller be owning that code path?

r.obc.Name = req.Name
r.obc.Namespace = req.Namespace

r.log.Info("Starting reconcile iteration for OBC", "req", req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adding the NamespaceName from the request (the only thing that is actually on the request) as a WithValues on the logger, to make sure that every log message will have the context of the obc that it refers to. This also means there will be no need to add the req value in this log statement.

Comment on lines +89 to +94
if r.obc.Status.Phase != initialPhase {
statusErr = r.Client.Status().Update(r.ctx, &r.obc)
if statusErr != nil {
r.log.Error(statusErr, "Failed to update OBC status.")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong! By design of the controller pattern, each reconcile start with a conceptual empty status, because each reconcile runs all of the steps from step 0 every time. This also mean it is a bug/mistake to take the output of a previous reconcile and use it as input for the current reconcile.

In addition, the fact that the phase field on the status didn't change, don't not mean that other fields didn't change. This mean that the phase field cannot be a good indication for an skipping the update operation. Even if one makes a case that there are no other fields, I would argue that the next developer might add additional fields then we have a subtle bug here that will be hard to detect.

If you really want to run the update command only when the status change you need to condition on a deep equal operation, but I think this is an over kill as this controller should only run when there is a change on the obc so I don't think there is any harm in running the update operation on each reconcile.

Suggested change
if r.obc.Status.Phase != initialPhase {
statusErr = r.Client.Status().Update(r.ctx, &r.obc)
if statusErr != nil {
r.log.Error(statusErr, "Failed to update OBC status.")
}
}
if statusErr := r.Client.Status().Update(r.ctx, &r.obc); statusErr != nil {
r.log.Error(statusErr, "Failed to update OBC status.")
}

CC: @leelavg

Comment on lines +123 to +125
if !r.obc.GetDeletionTimestamp().IsZero() {
return r.deletionPhase(ocsProviderClient, storageClient)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes against the predicate that was set on the setupWithManager. This is one example of why that predicate need to get removed

Comment on lines +123 to +125
if !r.obc.GetDeletionTimestamp().IsZero() {
return r.deletionPhase(ocsProviderClient, storageClient)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a branch based on GetDeletionTimestamp I would have structure the code to be more readable and apparent. this means an if ... else ... structure. And if you already decided that deletion phase gets its own function (instead of implemented here) then it make more sense to give the same treatment to the second code flow. I would also prefer that we handle the common case first (creation/update) and the uncommon case later (as a user reads code from top to bottom).

With all of these changes the code should look something like

if r.obc.GetDeletionTimeStamp.IsZero() {
    r.handlerObcCreationOrUpdate(...)
} else {
    r.handleObcDeletion(...)
}

}

if _, err := ocsProviderClient.NotifyObcCreated(r.ctx, storageClient.Status.ConsumerID, &r.obc); err != nil {
r.log.Error(err, "failed to notify provider of OBC created/updated", "namespaced/name", client.ObjectKeyFromObject(&r.obc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The obc namespace name / object key, should already be on the logger

Suggested change
r.log.Error(err, "failed to notify provider of OBC created/updated", "namespaced/name", client.ObjectKeyFromObject(&r.obc))
r.log.Error(err, "failed to notify provider of OBC created/updated")

@shirady Please apply that to all of the log messages

) (ctrl.Result, error) {
r.log.Info("OBC deleted", "namespaced/name", client.ObjectKeyFromObject(&r.obc))

obcNamespacedName := types.NamespacedName{Namespace: r.obc.Namespace, Name: r.obc.Name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Namespaced name already exist on the req, I am not sure why we need to get it from the obc.
And even if we do, we can get it using client.ObjectKeyFromObject(&r.obc)

// getStorageClientFromStorageClass returns the StorageClient that owns the given StorageClass (via ownerReference).
func (r *obcReconcile) getStorageClientFromStorageClass(storageClassName string) (*v1alpha1.StorageClient, error) {
sc := &storagev1.StorageClass{}
if err := r.Get(r.ctx, types.NamespacedName{Name: storageClassName}, sc); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please prefer ObjectKeyFromObject to manually creating a NamespcedName / ObjectKey

}

// getStorageClientFromStorageClass returns the StorageClient that owns the given StorageClass (via ownerReference).
func (r *obcReconcile) getStorageClientFromStorageClass(storageClassName string) (*v1alpha1.StorageClient, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we just creating a client index for SC by client? If we do that this entire function becomes redundant and we only need to have a simple list operation (with a limit of 1) to get the client that is associated with a given sc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants