Fix metadata-webhook cleanup race condition#3973
Fix metadata-webhook cleanup race condition#3973cardil wants to merge 1 commit intoopenshift-knative:release-1.37from
Conversation
Add namespaceSelector to the MutatingWebhookConfiguration to limit the webhook's scope to namespaces with the samples.knative.dev/release label. This prevents the webhook from blocking resource deletions in other namespaces when the serving-tests namespace is torn down. The issue occurred during upgrade test cleanup where the Route resource for deployment-upgrade-failure could not be deleted because the webhook service was unavailable after namespace cleanup started. Assisted-by: 🤖 Claude Opus/Sonnet 4.5
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cardil The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/cherrypick main |
|
@cardil: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@cardil: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Problem
The
deployment-upgrade-failuretest occasionally leaves Route resources stuck with finalizers because the metadata-webhook service is deleted before the Route can be processed by the webhook.When the webhook service is unavailable, the MutatingWebhookConfiguration still intercepts Route deletion requests, causing a timeout and leaving the Route stuck.
Root Cause
Race condition between:
Solution
Added
namespaceSelectorto the webhook configuration to limit its scope to only theserving-testsnamespace (which already has thesamples.knative.dev/release: devellabel). This ensures:Evidence
MESH=true(seehack/lib/serverless.bash:195-198)serving-testsnamespace already has the required label in100-namespace.yamlRelated
Upstream issue in knative/serving: cleanup order in
deployment_failure.godeletes webhook before tearing down Service (to be reported separately).Assisted-by: 🤖 Claude Opus/Sonnet 4.5