Skip to content

Implement O(1) PredicateEvaluator for inter pod affinity#9523

Draft
x13n wants to merge 2 commits intokubernetes:masterfrom
x13n:streaming-snapshot
Draft

Implement O(1) PredicateEvaluator for inter pod affinity#9523
x13n wants to merge 2 commits intokubernetes:masterfrom
x13n:streaming-snapshot

Conversation

@x13n
Copy link
Copy Markdown
Member

@x13n x13n commented Apr 20, 2026

This change introduces a high-performance ClusterSnapshot implementation that replaces traditional O(PodsOnNode) selector matching with incremental indexing, Copy-on-Write (CoW) simulation, and phased evaluation.

Key architectural pillars:

  • Incremental Indexing: Leverages the 'fort' pipeline library and StreamingSnapshotStore to update indices reactively as pods and nodes change.
  • CoW Simulation: Uses PatchSet-backed BTreeMap structures and slice operations to efficiently share state across simulation forks with O(1) cost.
  • Phased Evaluation: Splits computation into a serial 'PreparePod' phase and a parallel 'FastCheckAffinity' phase using bi-directional label indexing.

Other changes:

  • Support for complex namespace logic and AffinityTerm mapping.
  • Native integration with StreamingSnapshotStore via event propagation.
  • Disable legacy scheduler plugin when the fast path is enabled.
  • Introduced a new BenchmarkRunOnceAffinitySurge with heavy pod anti-affinity use to measure the improvement.

Performance Gain (BenchmarkRunOnceAffinitySurge - 5000 nodes, 50,000 pods):

  • Before (Default): 164.4s/op
  • After (Fast O(1)): 9.36s/op
  • Speedup: 17.56x

What type of PR is this?

/kind feature

What this PR does / why we need it:

Anti affinity is the most expensive part of scheduler logic used by Cluster Autoscaler.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Large part is AI generated, needs careful review.

Does this PR introduce a user-facing change?

A new, experimental --fast-predicates-enabled flag can be used to enable alternative implementation of pod anti-affinity checks.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Apr 20, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from elmiko April 20, 2026 18:00
@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 20, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If SIG Autoscaling contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 20, 2026
This change introduces a high-performance ClusterSnapshot implementation that
replaces traditional O(PodsOnNode) selector matching with incremental indexing,
Copy-on-Write (CoW) simulation, and phased evaluation.

Key architectural pillars:
- Incremental Indexing: Leverages the 'fort' pipeline library and StreamingSnapshotStore
  to update indices reactively as pods and nodes change.
- CoW Simulation: Uses PatchSet-backed BTreeMap structures and slice operations to
  efficiently share state across simulation forks with O(1) cost.
- Phased Evaluation: Splits computation into a serial 'PreparePod' phase and a
  parallel 'FastCheckAffinity' phase using bi-directional label indexing.

Other changes:
- Support for complex namespace logic and AffinityTerm mapping.
- Native integration with StreamingSnapshotStore via event propagation.
- Disable legacy scheduler plugin when the fast path is enabled.
- Introduced a new BenchmarkRunOnceAffinitySurge with heavy pod
  anti-affinity use to measure the improvement.

Performance Gain (BenchmarkRunOnceAffinitySurge - 5000 nodes, 50,000 pods):
- Before (Default): 164.4s/op
- After (Fast O(1)): 9.36s/op
- Speedup: 17.56x
@x13n x13n force-pushed the streaming-snapshot branch from c95f36a to a289b06 Compare April 21, 2026 09:33
Reallocate presence/forbiden slices once per fork, not once per pod.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants