OCPBUGS-83267: Use upgrades.Skippable for Gateway API upgrade test skip logic#31000
OCPBUGS-83267: Use upgrades.Skippable for Gateway API upgrade test skip logic#31000gcs278 wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@gcs278: This pull request references NE-2292 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdded a pre-Setup skip hook on Gateway API upgrade tests that performs platform/OLM capability and eligibility checks before Setup. Skip collects precheck errors (stored on the test), performs Ginkgo skip when appropriate, and Setup asserts no precheck error and uses a simplified capability resolver. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
73d03cd to
853bb58
Compare
|
@gcs278: This pull request references Jira Issue OCPBUGS-82586, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/router/gatewayapi_upgrade.go`:
- Around line 53-60: Replace the helper that panics inside Skip by adding a new
function gatewayAPIWithoutOLMEnabled(oc *exutil.CLI) (bool, error) (in
gatewayapicontroller.go) which performs the same FeatureGates check as
isNoOLMFeatureGateEnabled but returns (bool, error) instead of using o.Expect;
then change the Skip() call site to call gatewayAPIWithoutOLMEnabled(oc), check
the returned error and treat any error as a condition to skip the test
(log/return true) rather than allowing a panic, and ensure no o.Expect-based
helpers are invoked from within Skip().
In `@test/extended/router/gatewayapicontroller.go`:
- Around line 554-587: shouldSkipGatewayAPITests must not invoke
isIPv6OrDualStack because that helper uses o.Expect and can panic; change the
code so it calls a non-panicking variant that returns (bool, error) (either
modify isIPv6OrDualStack to return (bool, error) or add isIPv6OrDualStackSafe),
check the error and on error return (true, fmt.Sprintf("Failed to determine IP
stack: %v", err)), and only skip for IPv6/dual-stack when the boolean is true;
reference the functions shouldSkipGatewayAPITests and isIPv6OrDualStack (or the
new isIPv6OrDualStackSafe) and ensure no o.Expect is used in the path called
from GatewayAPIUpgradeTest.Skip.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c5b789a9-7819-41a8-b936-721a3717190d
📒 Files selected for processing (2)
test/extended/router/gatewayapi_upgrade.gotest/extended/router/gatewayapicontroller.go
853bb58 to
d765284
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
test/extended/router/gatewayapi_upgrade.go (1)
54-58:⚠️ Potential issue | 🔴 CriticalAvoid
o.Expect-based helper calls insideSkip()Line 54 calls
isNoOLMFeatureGateEnabled(oc), which useso.Expectinternally and can panic. In this pre-Setup()skip path, that can still produce unrecoverable panics and defeats theupgrades.Skippablerefactor.Proposed fix
- // Skip on clusters missing OLM/Marketplace capabilities if not using NO-OLM mode - if !isNoOLMFeatureGateEnabled(oc) { + // Skip on clusters missing OLM/Marketplace capabilities if not using NO-OLM mode + noOLMEnabled, err := gatewayAPIWithoutOLMEnabled(oc) + if err != nil { + g.By(fmt.Sprintf("skipping test: failed to read FeatureGates: %v", err)) + return true + } + if !noOLMEnabled { enabled, err := exutil.AllCapabilitiesEnabled(oc, olmCapabilities...) if err != nil || !enabled { g.By("skipping test: OLM/Marketplace capabilities are not enabled and GatewayAPIWithoutOLM is not enabled") return true } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/router/gatewayapi_upgrade.go` around lines 54 - 58, The skip path calls isNoOLMFeatureGateEnabled(oc) which uses o.Expect and can panic before Setup; replace that call with a non-panicking check (either introduce a new helper like isNoOLMFeatureGateEnabledNoExpect(oc) that returns (bool, error) or inline the feature-gate query without o.Expect), handle any error by treating the gate as disabled, and then use that boolean in the existing condition alongside exutil.AllCapabilitiesEnabled; ensure all o.Expect usage remains in Setup or later so Skip() cannot cause a panic.test/extended/router/gatewayapicontroller.go (1)
583-584:⚠️ Potential issue | 🔴 Critical
shouldSkipGatewayAPITestsstill has a transitive panic pathThe function is documented as panic-safe, but Line 583 calls
isIPv6OrDualStack(oc), and that helper useso.Expect(Line 622). This can still panic instead of returning a skip reason.Proposed fix
- if isIPv6OrDualStack(oc) { + ipv6OrDualStack, err := isIPv6OrDualStack(oc) + if err != nil { + return true, fmt.Sprintf("Failed to determine IP stack: %v", err) + } + if ipv6OrDualStack { return true, "Skipping Gateway API tests on IPv6/dual-stack cluster" }-func isIPv6OrDualStack(oc *exutil.CLI) bool { +func isIPv6OrDualStack(oc *exutil.CLI) (bool, error) { networkConfig, err := oc.AdminOperatorClient().OperatorV1().Networks().Get(context.Background(), "cluster", metav1.GetOptions{}) - o.Expect(err).NotTo(o.HaveOccurred(), "Failed to get network config") + if err != nil { + return false, err + } for _, cidr := range networkConfig.Spec.ServiceNetwork { if utilnet.IsIPv6CIDRString(cidr) { - return true + return true, nil } } - return false + return false, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/router/gatewayapicontroller.go` around lines 583 - 584, shouldSkipGatewayAPITests currently calls isIPv6OrDualStack(oc) which uses o.Expect and can panic; change the helper to a panic-safe signature (e.g., isIPv6OrDualStackSafe/ isIPv6OrDualStack returning (bool, error)) that does not call o.Expect but returns an error on failure, then update shouldSkipGatewayAPITests to call the new safe helper, handle the error path by returning a non-skip or a clear skip reason without panicking, and remove any direct uses of o.Expect inside isIPv6OrDualStack so tests remain panic-safe; refer to the functions shouldSkipGatewayAPITests and isIPv6OrDualStack and the use of o.Expect when locating code to modify.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/extended/router/gatewayapi_upgrade.go`:
- Around line 54-58: The skip path calls isNoOLMFeatureGateEnabled(oc) which
uses o.Expect and can panic before Setup; replace that call with a non-panicking
check (either introduce a new helper like isNoOLMFeatureGateEnabledNoExpect(oc)
that returns (bool, error) or inline the feature-gate query without o.Expect),
handle any error by treating the gate as disabled, and then use that boolean in
the existing condition alongside exutil.AllCapabilitiesEnabled; ensure all
o.Expect usage remains in Setup or later so Skip() cannot cause a panic.
In `@test/extended/router/gatewayapicontroller.go`:
- Around line 583-584: shouldSkipGatewayAPITests currently calls
isIPv6OrDualStack(oc) which uses o.Expect and can panic; change the helper to a
panic-safe signature (e.g., isIPv6OrDualStackSafe/ isIPv6OrDualStack returning
(bool, error)) that does not call o.Expect but returns an error on failure, then
update shouldSkipGatewayAPITests to call the new safe helper, handle the error
path by returning a non-skip or a clear skip reason without panicking, and
remove any direct uses of o.Expect inside isIPv6OrDualStack so tests remain
panic-safe; refer to the functions shouldSkipGatewayAPITests and
isIPv6OrDualStack and the use of o.Expect when locating code to modify.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 85458f16-0cb3-47cb-9f62-6d36fc11beb4
📒 Files selected for processing (2)
test/extended/router/gatewayapi_upgrade.gotest/extended/router/gatewayapicontroller.go
f580b21 to
6a7a94e
Compare
|
@gcs278: This pull request references Jira Issue OCPBUGS-82586, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Most importantly, I should have kicked off "periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.21-e2e-metal-ipi-upgrade-ovn-ipv6" in #30897 and I missed that: /payload-job periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-paused periodic-ci-openshift-release-main-ci-4.22-upgrade-from-stable-4.21-e2e-vsphere-ovn-upgrade periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.21-e2e-metal-ipi-ovn-upgrade periodic-ci-openshift-release-main-nightly-4.22-upgrade-from-stable-4.21-e2e-metal-ipi-upgrade-ovn-ipv6 |
|
@gcs278: trigger 4 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5cc518b0-36b4-11f1-833a-a696c774475a-0 |
|
@CodeRabbit does everything look good? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 7475
✅ Core fix is sound: The ✅ ✅ ✅ A few minor observations (no blockers):
None of these are correctness issues — the PR achieves its goal of preventing the goroutine panic on IPv6/dual-stack, OKD, and unsupported platforms. Looks good to merge. 🐇 |
6a7a94e to
d44a553
Compare
|
/payload-abort |
|
I hate to kick off so many jobs, but I don't want to miss any jobs that could be failing: |
|
@gcs278: trigger 66 job(s) of type informing for the nightly release of OCP 4.22
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/284f6b70-36b5-11f1-80e0-7384393647eb-0 |
|
@gcs278: This pull request references Jira Issue OCPBUGS-83267, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
cluster bootstrap |
|
/test e2e-aws-ovn-serial-1of2 |
|
Unrelated flakes - I believe branching happened I suspect these are 5.0-specific flakes/failures |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gcs278, rhamini3 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Should it be targeting 5.0 now? |
|
@gcs278: This pull request references Jira Issue OCPBUGS-83267, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@CodeRabbit analyze the failures in the failed e2e jobs, and report the commonality between them. Ensure none of the failures are gateway api related. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 5126 🏁 Script executed: Length of output: 3317 🏁 Script executed: Length of output: 5659 🏁 Script executed: Length of output: 512 🏁 Script executed: Length of output: 320034
Commonalities across all failed jobs
Gateway API impact check
What likely happened
If you want, I can deep-scan each job’s e2e.log tail and junit files specifically for:
Say the word and I’ll pull those exact snippets per job. |
|
Does it still want me to target 4.22?? |
|
/jira refresh |
|
@gcs278: This pull request references Jira Issue OCPBUGS-83267, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
I'm going to set the bug as 5.0.0 because that's where it will have to be merged first: |
|
@gcs278: This pull request references Jira Issue OCPBUGS-83267, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@rhamini3: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@gcs278: This pull request references Jira Issue OCPBUGS-83267, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@gcs278: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
Summary
g.Skip()fromSetup(), which runs inside a goroutine managed by the disruption framework. Sinceg.Skip()panics and Ginkgo can only recover panics inside leaf nodes, this caused unrecoverable panics on IPv6/dual-stack, OKD, and unsupported platform clusters.upgrades.Skippableinterface with aSkip()method that the disruption framework calls beforeSetup(), avoiding the goroutine panic.checkPlatformSupportAndGetCapabilitiesintoshouldSkipGatewayAPITests(safe outside Ginkgo nodes) andgetPlatformCapabilities(returns LB/DNS support).Test plan
[ ] Verify Gateway API upgrade test is skipped cleanly on OKD clustersI don't see any OKD jobs[ ] Verify Gateway API upgrade test is skipped cleanly on unsupported platformsLooks like we support all platforms in the current jobs I can search thoughgatewayapicontroller.goBeforeEach) still skip correctly viag.Skip()🤖 Generated with Claude Code