Implement O(1) PredicateEvaluator for inter pod affinity#9523
Implement O(1) PredicateEvaluator for inter pod affinity#9523x13n wants to merge 2 commits intokubernetes:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: x13n The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
This issue is currently awaiting triage. If SIG Autoscaling contributors determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This change introduces a high-performance ClusterSnapshot implementation that replaces traditional O(PodsOnNode) selector matching with incremental indexing, Copy-on-Write (CoW) simulation, and phased evaluation. Key architectural pillars: - Incremental Indexing: Leverages the 'fort' pipeline library and StreamingSnapshotStore to update indices reactively as pods and nodes change. - CoW Simulation: Uses PatchSet-backed BTreeMap structures and slice operations to efficiently share state across simulation forks with O(1) cost. - Phased Evaluation: Splits computation into a serial 'PreparePod' phase and a parallel 'FastCheckAffinity' phase using bi-directional label indexing. Other changes: - Support for complex namespace logic and AffinityTerm mapping. - Native integration with StreamingSnapshotStore via event propagation. - Disable legacy scheduler plugin when the fast path is enabled. - Introduced a new BenchmarkRunOnceAffinitySurge with heavy pod anti-affinity use to measure the improvement. Performance Gain (BenchmarkRunOnceAffinitySurge - 5000 nodes, 50,000 pods): - Before (Default): 164.4s/op - After (Fast O(1)): 9.36s/op - Speedup: 17.56x
c95f36a to
a289b06
Compare
Reallocate presence/forbiden slices once per fork, not once per pod.
This change introduces a high-performance ClusterSnapshot implementation that replaces traditional O(PodsOnNode) selector matching with incremental indexing, Copy-on-Write (CoW) simulation, and phased evaluation.
Key architectural pillars:
Other changes:
Performance Gain (BenchmarkRunOnceAffinitySurge - 5000 nodes, 50,000 pods):
What type of PR is this?
/kind feature
What this PR does / why we need it:
Anti affinity is the most expensive part of scheduler logic used by Cluster Autoscaler.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Large part is AI generated, needs careful review.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: