Skip to content

Operator stops reconciling due to no incoming events in reflector watcher stream #3428

@viggys

Description

@viggys

Seeking help from k8s experts.

I’m facing an issue with my custom operator - the custom resources stop getting reconciled after a few hours (or days) for any CREATE/UPDATE/DELETE events. On debugging, I observed that my operator stops receiving events in the controller’s reflector watcher. There are no errors with respect to the watcher around the time when the issue starts to happen. Although, there are general watch errors, like “too old resource version“ or “unexpected EOF“ which are recoverable. To recover from this situation, I’m forced to restart my operator pod after which things start to work as expected, until I see the same issue happen again after few hours or days.

Is there any way I can recover from a “silent” dead watch in this situation? I do not have access to the reflector, as it’s internally managed by controller-runtime. If not in the operator, what can i check further to investigate this issue?

Kubernetes version: 1.34.1
Controller-Runtime version: v0.22.4

Note:

  • I have tried using Cache SyncPeriod, but it only helps with reconciling existing resources in the operator’s cache. It does not help with reconciling newly created resources.

Cluster information:
Kubernetes version: 1.34.1
Cloud being used: OCI
Installation method: Oracle Kubernetes Engine
Host OS: Oracle Linux 8.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions