Skip to content

fix: re-fetch StatefulSet inside RetryOnConflict to resolve stale resourceVersion conflicts#5578

Merged
cyclinder merged 3 commits intomainfrom
copilot/fix-nightly-k8s-matrix-ci
Apr 29, 2026
Merged

fix: re-fetch StatefulSet inside RetryOnConflict to resolve stale resourceVersion conflicts#5578
cyclinder merged 3 commits intomainfrom
copilot/fix-nightly-k8s-matrix-ci

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 29, 2026

The nightly K8s matrix CI was failing with timed out waiting for the condition in the A00018 annotation e2e test, where a StatefulSet's IPPool annotation is updated after a pod restart cycle.

Root Cause

stsObj was fetched once outside the retry closure. After RestartAndValidateStatefulSetPodIP, the StatefulSet controller is still updating status fields (readyReplicas, etc.), incrementing resourceVersion. Every retry attempt re-submitted the same stale resourceVersion, guaranteeing a 409 conflict until DefaultBackoff's 4 steps were exhausted.

// Before: stale stsObj captured outside the closure — always conflicts after first attempt
stsObj, err := frame.GetStatefulSet(stsYaml.Name, nsName)
stsObj.Spec.Template.Annotations = map[string]string{...}
err = retry.RetryOnConflictWithContext(ctx, retry.DefaultBackoff, func(ctx context.Context) error {
    return frame.UpdateResource(stsObj) // same stale resourceVersion on every retry
})

Fix

Move the GetStatefulSet call and annotation mutation inside the retry closure so each attempt works with the latest resourceVersion:

// After: re-fetch on every attempt — conflict-safe
err = retry.RetryOnConflictWithContext(ctx, retry.DefaultBackoff, func(ctx context.Context) error {
    stsObj, err := frame.GetStatefulSet(stsYaml.Name, nsName)
    if err != nil {
        return err
    }
    stsObj.Spec.Template.Annotations = map[string]string{...}
    return frame.UpdateResource(stsObj)
})

Copilot AI linked an issue Apr 29, 2026 that may be closed by this pull request
@cyclinder cyclinder marked this pull request as ready for review April 29, 2026 02:06
Copilot AI changed the title [WIP] Fix nightly K8s Matrix CI failure fix: re-fetch StatefulSet inside RetryOnConflict to resolve stale resourceVersion conflicts Apr 29, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.70%. Comparing base (0870f42) to head (163ffd5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #5578   +/-   ##
=======================================
  Coverage   67.70%   67.70%           
=======================================
  Files          61       61           
  Lines        6360     6360           
=======================================
  Hits         4306     4306           
  Misses       1770     1770           
  Partials      284      284           
Flag Coverage Δ
unittests 67.70% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cyclinder cyclinder merged commit 0295e0b into main Apr 29, 2026
47 of 49 checks passed
@cyclinder cyclinder deleted the copilot/fix-nightly-k8s-matrix-ci branch April 29, 2026 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nightly K8s Matrix CI 2026-04-28: Failed

2 participants