Skip to content

Conversation

@mbobrovskyi
Copy link
Contributor

@mbobrovskyi mbobrovskyi commented Nov 27, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Use WithLooseOnApply() in evictWorkloadAfterFailedTASReplacement.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

On race condition, when we're trying to evict a workload, we hit an error here and just continue without evicting. This is incorrect because it results in a different event reason and message.

Here’s an example of what we get in the unit tests if we revert the changes:

[]v1beta2.Workload{
          	{
          		TypeMeta: {},
          		ObjectMeta: v1.ObjectMeta{
          			... // 3 identical fields
          			SelfLink:          "",
          			UID:               "",
        - 			ResourceVersion:   "3",
        + 			ResourceVersion:   "2",
          			Generation:        0,
          			CreationTimestamp: {},
          			... // 1 ignored and 6 identical fields
          		},
          		Spec: {PodSets: {{Name: "one", Template: {Spec: {Containers: {{Name: "c", Resources: {Requests: {s"cpu": {i: {...}, s: "1", Format: "DecimalSI"}}}}}, RestartPolicy: "Never"}}, Count: 1, TopologyRequest: &{Preferred: &"cloud.provider.com/rack"}}}, QueueName: "tas-main"},
          		Status: v1beta2.WorkloadStatus{
          			Conditions: []v1.Condition(Inverse(cmpopts.SortSlices, []v1.Condition{
          				{Type: "Admitted", Status: "True", Reason: "ByTest", Message: "Admitted by ClusterQueue cq", ...},
        - 				{
        - 					Type:               "Evicted",
        - 					Status:             "True",
        - 					LastTransitionTime: s"2025-11-27 13:26:33 +0530 IST",
        - 					Reason:             "NodeFailures",
        - 					Message:            "Workload was evicted as there was no replacement for a failed node: x0",
        - 				},
          				{Type: "QuotaReserved", Status: "True", Reason: "AdmittedByTest", Message: "Admitted by ClusterQueue cq", ...},
          			})),
          			Admission:    &{ClusterQueue: "cq", PodSetAssignments: {{Name: "one", Flavors: {s"cpu": "rf"}, ResourceUsage: {s"cpu": {i: {...}, Format: "DecimalSI"}}, Count: &1, ...}}},
          			RequeueState: nil,
          			... // 2 identical fields
          			ResourceRequests:                    nil,
          			AccumulatedPastExecutionTimeSeconds: nil,
        - 			SchedulingStats: &v1beta2.SchedulingStats{
        - 				Evictions: []v1beta2.WorkloadSchedulingStatsEviction{{Reason: "NodeFailures", Count: 1}},
        - 			},
        + 			SchedulingStats:       nil,
          			NominatedClusterNames: nil,
          			ClusterName:           nil,
          			... // 1 ignored field
          		},
          	},
          }
    scheduler_tas_test.go:5713: Unexpected events (-want/+got):
          []testing.EventRecord{
          	{
          		Key:       {Namespace: "default", Name: "wl"},
        - 		EventType: "Normal",
        + 		EventType: "Warning",
        - 		Reason:    "EvictedDueToNodeFailures",
        + 		Reason:    "SecondPassFailed",
          		... // 1 ignored field
          	},
          }

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Nov 27, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mbobrovskyi
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 27, 2025
@netlify
Copy link

netlify bot commented Nov 27, 2025

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 07cfbbf
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/692d56db4b98470008550801
😎 Deploy Preview https://deploy-preview-7933--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 27, 2025
@mimowo
Copy link
Contributor

mimowo commented Nov 27, 2025

/hold
What is the reason for this change?

EDIT: iiuc this is performance optimization to avoid retrying scheduling cycle on conflicts, or another reason?

In any case if we want to change how eviction works, I think we should also follow the same pattern for preemptions.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 27, 2025
Comment on lines 582 to 583
if err := workload.Evict(ctx, s.client, s.recorder, wl, kueue.WorkloadEvictedDueToNodeFailures, msg, "", s.clock,
workload.EvictWithLooseOnApply(), workload.EvictWithRetryOnConflictForPatch()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduler issues evictions (like preemptions) using strict mode: https://github.com/kubernetes-sigs/kueue/blob/4301e1c8f7f78debb3be609b7467cc310b2c10f7/pkg/scheduler/preemption/preemption.go#L172C37-L173

I'm not saying we cannot change it, but if we change then we should change all evictions (including preemptions).

Copy link
Contributor

@mimowo mimowo Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since it was like that for a long time, then if we want to change it, then we should guard by feature gate. I think such a change can be considered for 0.16, because 0.15 is too soon to evaluate the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduler issues evictions (like preemptions) using strict mode: https://github.com/kubernetes-sigs/kueue/blob/4301e1c8f7f78debb3be609b7467cc310b2c10f7/pkg/scheduler/preemption/preemption.go#L172C37-L173

Probably yes, I haven’t checked that yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case let's park the change after 0.15 and re-evaluate then. I haven't seen from users any reports complaining about that scenario so far. I need more time to think how this should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen from users any reports complaining about that scenario so far.

It’s a very rare case that we can hit this, but I think we should handle it properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's revisit post 0.15. I still think the strategy for handling this case of eviction should be also consistent with handling preemption, which is just another form of eviction. So I want to extend the scope of the PR to cover all evictions triggered by scheduler - assuming we go this way, but I need to think more after 0.15.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added workload.EvictWithLooseOnApply() and workload.EvictWithRetryOnConflictForPatch() for preemption.

@mbobrovskyi
Copy link
Contributor Author

mbobrovskyi commented Nov 27, 2025

EDIT: iiuc this is performance optimization to avoid retrying scheduling cycle on conflicts, or another reason?

Not only if we have an error here do we just continue without evicting — this is incorrect because it results in a different event reason and message.

You can try reverting my changes and running the test.

[]v1beta2.Workload{
          	{
          		TypeMeta: {},
          		ObjectMeta: v1.ObjectMeta{
          			... // 3 identical fields
          			SelfLink:          "",
          			UID:               "",
        - 			ResourceVersion:   "3",
        + 			ResourceVersion:   "2",
          			Generation:        0,
          			CreationTimestamp: {},
          			... // 1 ignored and 6 identical fields
          		},
          		Spec: {PodSets: {{Name: "one", Template: {Spec: {Containers: {{Name: "c", Resources: {Requests: {s"cpu": {i: {...}, s: "1", Format: "DecimalSI"}}}}}, RestartPolicy: "Never"}}, Count: 1, TopologyRequest: &{Preferred: &"cloud.provider.com/rack"}}}, QueueName: "tas-main"},
          		Status: v1beta2.WorkloadStatus{
          			Conditions: []v1.Condition(Inverse(cmpopts.SortSlices, []v1.Condition{
          				{Type: "Admitted", Status: "True", Reason: "ByTest", Message: "Admitted by ClusterQueue cq", ...},
        - 				{
        - 					Type:               "Evicted",
        - 					Status:             "True",
        - 					LastTransitionTime: s"2025-11-27 13:26:33 +0530 IST",
        - 					Reason:             "NodeFailures",
        - 					Message:            "Workload was evicted as there was no replacement for a failed node: x0",
        - 				},
          				{Type: "QuotaReserved", Status: "True", Reason: "AdmittedByTest", Message: "Admitted by ClusterQueue cq", ...},
          			})),
          			Admission:    &{ClusterQueue: "cq", PodSetAssignments: {{Name: "one", Flavors: {s"cpu": "rf"}, ResourceUsage: {s"cpu": {i: {...}, Format: "DecimalSI"}}, Count: &1, ...}}},
          			RequeueState: nil,
          			... // 2 identical fields
          			ResourceRequests:                    nil,
          			AccumulatedPastExecutionTimeSeconds: nil,
        - 			SchedulingStats: &v1beta2.SchedulingStats{
        - 				Evictions: []v1beta2.WorkloadSchedulingStatsEviction{{Reason: "NodeFailures", Count: 1}},
        - 			},
        + 			SchedulingStats:       nil,
          			NominatedClusterNames: nil,
          			ClusterName:           nil,
          			... // 1 ignored field
          		},
          	},
          }
    scheduler_tas_test.go:5713: Unexpected events (-want/+got):
          []testing.EventRecord{
          	{
          		Key:       {Namespace: "default", Name: "wl"},
        - 		EventType: "Normal",
        + 		EventType: "Warning",
        - 		Reason:    "EvictedDueToNodeFailures",
        + 		Reason:    "SecondPassFailed",
          		... // 1 ignored field
          	},
          }

@mimowo
Copy link
Contributor

mimowo commented Nov 27, 2025

Ok but if this is bugfix then it requires different tagging and release note. Also I'm not clear why preemptions are any different.

Also. I think we can just improve the message for SecondPassaFailed to concatenate the underlying reason as a quick fix without changing the mechanics

@mimowo
Copy link
Contributor

mimowo commented Nov 27, 2025

@mbobrovskyi
Copy link
Contributor Author

btw, https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_kueue/7933/pull-kueue-test-integration-baseline-main/1993948535971647488 looks like unrelated new flake, wdyt?

I think this is just a network issue, no?

@mimowo
Copy link
Contributor

mimowo commented Nov 27, 2025

I think this is just a network issue, no?

I think this is a new test and it has a flake / bug - we call StartManager, but I don't see StopManager

EDIT: opened: #7935

@mbobrovskyi
Copy link
Contributor Author

/hold

For v0.15.0 release.

@mbobrovskyi mbobrovskyi force-pushed the cleanup/use-with-loose-on-evictWorkloadAfterFailedTASReplacement branch from 2b4593a to 07cfbbf Compare December 1, 2025 08:50
@mbobrovskyi
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2025
@mbobrovskyi mbobrovskyi changed the title Use WithLooseOnApply() in evictWorkloadAfterFailedTASReplacement. Use WithLooseOnApply() in scheduler eviction. Dec 1, 2025
@mbobrovskyi mbobrovskyi requested a review from mimowo December 1, 2025 10:05
@mimowo
Copy link
Contributor

mimowo commented Dec 1, 2025

/hold
It requires clarifying the status: feature / bug. I think it has user-facing consequences, so also release note.,

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2025
@mimowo
Copy link
Contributor

mimowo commented Dec 1, 2025

I will try to return to this PR end of the week or next week. It seems like a performance improvement we could call maybe bugfix and cherrypick, but I need more time to think about it. Feel free to ping me EOW.

@PBundyra
Copy link
Contributor

PBundyra commented Dec 1, 2025

@mbobrovskyi Can you make the description more exhaustive? What is the context/motivation for it and what consequences it has?

@mbobrovskyi
Copy link
Contributor Author

@mbobrovskyi Can you make the description more exhaustive? What is the context/motivation for it and what consequences it has?

Updated. Also explained here #7933 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants