GREP-375 add scheduler backend framework by kangclzjc · Pull Request #372 · ai-dynamo/grove

kangclzjc · 2026-01-27T12:44:25Z

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Add scheduler backend framework to support multiple scheduler backends

Which issue(s) this PR fixes:

Fixes #275
Fixes #375

Special notes for your reviewer:

Does this PR introduce a API change?

Additional documentation e.g., enhancement proposals, usage docs, etc.:

Signed-off-by: kangclzjc <[email protected]>

Signed-off-by: Kang Zhang <[email protected]>

Signed-off-by: kangclzjc <[email protected]>

docs/proposals/375-scheduler-backend-framework/README.md

Co-authored-by: Sanjay Chatterjee <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

copy-pr-bot · 2026-02-03T06:59:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Co-authored-by: Sanjay Chatterjee <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

remove useless words Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Signed-off-by: kangclzjc <[email protected]>

Signed-off-by: Kang Zhang <[email protected]>

remove phase1 in limitation Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Signed-off-by: Kang Zhang <[email protected]>

Ronkahn21 · 2026-02-05T12:16:38Z

docs/proposals/375-scheduler-backend-framework/README.md

+
+For detailed lifecycle flow, see [PodGang Lifecycle Changes](#podgang-lifecycle-changes).
+
+### Backend Interface Definition


The interface currently omits the relationship between ClusterTopology and secondary resources. How do you envision the navigational link from the main topology to other specific Topology CRDs?

Yes, this is a good point. Per my understanding, for each scheduler backend, we should first define the mapping once the backend Initiation, and then we have several hooks like: PreparePod for modify topology label in spec, also SycPodGang hook to translate Topology to specific Topology in other CRDs.

today we dont have controller for Cluster Topology, we would add it as part of multi cluster topology support,
so it might need extension point of its own

Yes, I add an future note in GREP.

docs/proposals/375-scheduler-backend-framework/README.md

Ronkahn21 · 2026-02-05T12:28:00Z

docs/proposals/375-scheduler-backend-framework/README.md

+
+#### New Flow (With Framework):
+1. **Create PodGang early** with PodGroups having empty PodReferences and `Initialized=False`
+2. **Create Pods** (with scheduling gates to block scheduling)


Could we do this without using the scheduling Gate, In large scale this would be intensive to modify all the pods spec to remove the scheduling Gate

I agree with u. If we could refine this scheduling gate, that's would be a good enhancement. Maybe we should raise this question and discuss it in another GREP?

Maybe different question would would happend if would not use the scheduling gate at all (beside what we do today)

pods would be schedulable as soon as they’re created and couldn't guarantee gang schedule

You're right. In large scale this would be intensive to modify all the pods spec to remove the scheduling Gate. I will create another issue to track this.

docs/proposals/375-scheduler-backend-framework/README.md

Move scheduler string to struct Co-authored-by: Ron Kahn <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Signed-off-by: Kang Zhang <[email protected]>

sanjaychatterjee

LGTM. Made a couple of minor suggestions to update the GREP. Thanks!

docs/proposals/375-scheduler-backend-framework/README.md

sanjaychatterjee · 2026-02-18T08:51:21Z

docs/proposals/375-scheduler-backend-framework/README.md

+2. Wait for all pods to have back-references to PodGang
+3. Create PodGang with complete PodReferences
+
+#### New Flow (With Framework):


Can you please clarify the flow when the scheduler backend will create a scheduler specific CRs for the workload, e.g. PodGroup for KAI, or Workload for kube-scheduler?

Yes, added. After PodGang created, this backend will create a scheduler specific CR like below.

1. **Create PodGang early** with PodGroups having empty PodReferences and `Initialized=False`. 2. **Backend creates scheduler-specific CRs**: The Backend Controller reconciles the new PodGang and calls `SyncPodGang()` on the resolved backend. The backend creates or updates its scheduler-specific resources (e.g. PodGroup for kai-scheduler, Workload for kube-scheduler when supported). These CRs must exist before pods are allowed to be scheduled so the scheduler can enforce gang/topology semantics.

docs/proposals/375-scheduler-backend-framework/README.md

unmarshall · 2026-02-18T11:44:10Z

docs/proposals/375-scheduler-backend-framework/README.md

+Abstraction layer bridging Grove and specific schedulers:
+- **Backend Manager**: Singleton that initializes and provides access to active backend
+- **KAI Backend**: Implementation for KAI scheduler (creates PodGroup CRs in future)
+- **Kube Backend**: Minimal implementation for default kube-scheduler (no custom CRs)


Would you not be creating the Workload object if GangScheduling is enabled?

If GangScheduling is enable, that means kube scheduler support GangScheduling(Workload API), then grove will create the Workload object to leverage kube scheduler GangScheduling feature.

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Signed-off-by: Kang Zhang <[email protected]>

unmarshall · 2026-02-18T14:09:38Z

docs/proposals/375-scheduler-backend-framework/README.md

+
+#### Layer 4: Scheduler Layer
+Kubernetes schedulers that actually place pods:
+- **KAI Scheduler**: Gang scheduling with topology awareness


either we just mention that the backend schedulers in the scheduling layer will be responsible for providing supporting features like gang scheduling, topology aware packing, gang preemption etc.. What you mention is only part of some features for KAI and none for Kube Scheduler.

unmarshall · 2026-02-19T06:30:38Z

docs/proposals/375-scheduler-backend-framework/README.md

+
+For detailed lifecycle flow, see [PodGang Lifecycle Changes](#podgang-lifecycle-changes).
+
+### Backend Interface Definition


Rename this to Scheduler Backend Interface

unmarshall · 2026-02-19T06:34:26Z

docs/proposals/375-scheduler-backend-framework/README.md

+	PreparePod(pod *corev1.Pod)
+
+	// ValidatePodCliqueSet validates a PodCliqueSet for this scheduler backend.
+	// Called by the PodCliqueSet validation webhook (create and update). Backends can perform


// ValidatePodCliqueSet provides an ability to the scheduler backends to run additional
// validations on the PodCliqueSet resource. For example - if a scheduler does not yet support
// topology aware placements and if the PodCliqueSet has defined required topology pack constraints
// then it can choose to reject the PodCliqueSet by returning an error.

unmarshall · 2026-02-19T11:52:16Z

docs/proposals/375-scheduler-backend-framework/README.md

+}
+```
+
+**Future note:** Cluster topology (e.g. multi-cluster topology support) may require its own extension point or additional methods on this interface; the interface is expected to evolve as those needs are clarified.


This point lacks context and is therefore quite unclear.
If it is not a goal of this GREP then it should be added as a non-goal

unmarshall · 2026-02-19T11:55:10Z

docs/proposals/375-scheduler-backend-framework/README.md

+
+### Backend Manager
+
+The manager initializes scheduler backends: the kube-scheduler backend is always created and active; additional backends are created from OperatorConfiguration profiles. It provides access by name and a default:


This is not entirely correct. It will initialize the enabled scheduler backends. This component does not assume a default as that is the job of the OperatorConfiguration. The defaulting happens there and not here.

unmarshall · 2026-02-19T17:43:31Z

docs/proposals/375-scheduler-backend-framework/README.md

+4. **Update PodGang** with PodReferences once all pods are created, and set `Initialized=True`.
+5. **Scheduling gates removed** to allow pods to be scheduled. The scheduler uses the backend-created CRs (PodGroup/Workload) when placing pods.
+
+#### New PodGang Status Condition


Create a sub-heading under Revised PodGang Creation Flow just after it:

Revised PodGang Creation Flow

To understand the new PodGang creation flow, we first introduce the enhancements made to the PodGangStatus.

PodGang API enhancements

A new metav1.Condition has been introduced for PodGang.

const ( // PodGangConditionTypeInitialized indicates that the PodGang has been populated // with pod references and pods can lift scheduling gates. PodGangConditionTypeInitialized PodGangConditionType = "Initialized" )

A PodGang is considered as Initialized when:

All constituent Pods are created.

Pods back-reference to their PodGang via a grove.io/podgang label.

PodGang.Spec.PodGroups have PodReferences fully populated.

NOTE: Field PodReferences in PodGang.Spec.PodGroups is subject to change. If it does then this GREP will need to be updated accordingly.

Creation Flow

< here you define the creation flow >

unmarshall · 2026-02-19T17:44:24Z

docs/proposals/375-scheduler-backend-framework/README.md

+| Status    | Reason                              | Description                                                  |
+| --------- | ----------------------------------- | ------------------------------------------------------------ |
+| `True`    | `AllPodsCreated`  | All pods have been created and references populated |
+| `False`   | `PodsNotCreated` | Waiting for all pods to be created and wait for all pods references to be filled in PodGang|


Need to revisit this reason. Since the description is overloaded with 2 different reasons.

unmarshall · 2026-02-19T17:45:42Z

docs/proposals/375-scheduler-backend-framework/README.md

+
+Unit tests will be implemented for all framework related components:
+
+**Backend Interface and Registry** (`operator/internal/schedulerBackend/`)


This list of tests will go stale in no time. Do you have a better suggestion?

unmarshall · 2026-02-19T17:46:29Z

docs/proposals/375-scheduler-backend-framework/README.md

+
+#### E2E Tests
+
+All existing e2e tests should be passed based on all supported schedulers.


What you miss here is changes that need to be done for E2E which today always assumes a specific scheduler backend (KAI). Currently there is no way to configure that.

unmarshall · 2026-02-19T17:47:34Z

docs/proposals/375-scheduler-backend-framework/README.md

+#### Alpha
+- Core backend interface defined and implemented
+- Backend registry functional
+- Basic operator configuration support


Should you add KAI implementation to Alpha?

kangclzjc added 3 commits January 27, 2026 12:26

add scheduler backend grep

8322af4

Signed-off-by: kangclzjc <[email protected]>

add pic

0bbc20a

Signed-off-by: kangclzjc <[email protected]>

refine proposal

42f591f

Signed-off-by: Kang Zhang <[email protected]>

kangclzjc marked this pull request as ready for review January 27, 2026 12:45

kangclzjc requested review from Ronkahn21, gflarity, sanjaychatterjee, shayasoolin and unmarshall as code owners January 27, 2026 12:45

kangclzjc added 3 commits January 28, 2026 09:07

update toc

b166471

Signed-off-by: Kang Zhang <[email protected]>

format tab to spaces

f99adf4

Signed-off-by: Kang Zhang <[email protected]>

rename 275 to 375 as I created a issue for this

39eb335

Signed-off-by: Kang Zhang <[email protected]>

kangclzjc changed the title ~~GREP add scheduler backend framework~~ GREP-375 add scheduler backend framework Jan 28, 2026

update pod mutate

77fbb02

Signed-off-by: kangclzjc <[email protected]>

unmarshall reviewed Feb 3, 2026

View reviewed changes

docs/proposals/375-scheduler-backend-framework/README.md Outdated Show resolved Hide resolved

unmarshall reviewed Feb 3, 2026

View reviewed changes

docs/proposals/375-scheduler-backend-framework/README.md Outdated Show resolved Hide resolved

sanjaychatterjee reviewed Feb 3, 2026

View reviewed changes

unmarshall reviewed Feb 3, 2026

View reviewed changes

docs/proposals/375-scheduler-backend-framework/README.md Outdated Show resolved Hide resolved

unmarshall reviewed Feb 3, 2026

View reviewed changes

docs/proposals/375-scheduler-backend-framework/README.md Outdated Show resolved Hide resolved

unmarshall reviewed Feb 3, 2026

View reviewed changes

Update docs/proposals/375-scheduler-backend-framework/README.md

43e2dc2

Co-authored-by: Sanjay Chatterjee <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

kangclzjc and others added 8 commits February 3, 2026 15:43

Update docs/proposals/375-scheduler-backend-framework/README.md

dab7c26

Co-authored-by: Sanjay Chatterjee <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Update docs/proposals/375-scheduler-backend-framework/README.md

5dd7420

Co-authored-by: Sanjay Chatterjee <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Update docs/proposals/375-scheduler-backend-framework/README.md

c1401ea

Co-authored-by: Sanjay Chatterjee <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Update docs/proposals/375-scheduler-backend-framework/README.md

a2b6113

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Update docs/proposals/375-scheduler-backend-framework/README.md

fcfbeb1

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Update docs/proposals/375-scheduler-backend-framework/README.md

0fc0dec

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Update docs/proposals/375-scheduler-backend-framework/README.md

5b6ad65

remove useless words Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

remove kai and some non-goals

1c8e245

Signed-off-by: kangclzjc <[email protected]>

kangclzjc and others added 8 commits February 4, 2026 14:22

fix migration

12edb56

Signed-off-by: Kang Zhang <[email protected]>

Apply suggestions from code review

08b6a4d

remove phase1 in limitation Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

add schedulers capability mismatch

868c475

Signed-off-by: Kang Zhang <[email protected]>

format toc

888167f

Signed-off-by: Kang Zhang <[email protected]>

refine podgang lifecycle

9b79f39

Signed-off-by: Kang Zhang <[email protected]>

remove dependencies

afe8ebf

Signed-off-by: Kang Zhang <[email protected]>

modify scheduler backend framework picture

fa19181

Signed-off-by: Kang Zhang <[email protected]>

modify scheduler backend framework picture

3a9bc2a

Signed-off-by: Kang Zhang <[email protected]>

Ronkahn21 reviewed Feb 5, 2026

View reviewed changes

kangclzjc and others added 2 commits February 6, 2026 16:41

Update docs/proposals/375-scheduler-backend-framework/README.md

4017c84

Move scheduler string to struct Co-authored-by: Ron Kahn <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

add new requirements

4959656

Signed-off-by: Kang Zhang <[email protected]>

kangclzjc force-pushed the grep_scheduler_backend branch from b819afc to 4959656 Compare February 13, 2026 01:34

kangclzjc and others added 6 commits February 14, 2026 11:47

refine config to schedulerConfig

4b0dddd

Signed-off-by: Kang Zhang <[email protected]>

Remove No Goas

4160b93

add validatepcs

35ac93f

Signed-off-by: Kang Zhang <[email protected]>

modify picture

4ae0a70

Signed-off-by: Kang Zhang <[email protected]>

replace user with admin

575ec3f

Signed-off-by: Kang Zhang <[email protected]>

refine schedulerName

3ac9845

Signed-off-by: Kang Zhang <[email protected]>

kangclzjc force-pushed the grep_scheduler_backend branch from 7012497 to 3ac9845 Compare February 18, 2026 06:10

refine KAI name

bbb00b8

Signed-off-by: Kang Zhang <[email protected]>

sanjaychatterjee reviewed Feb 18, 2026

View reviewed changes

unmarshall reviewed Feb 18, 2026

View reviewed changes

kangclzjc and others added 5 commits February 19, 2026 08:12

Refine description

db0d902

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Refine backend controller

6ee9b47

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Update Podclique controller description

c52b1d8

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

Update Kai backend description

a326fe6

Co-authored-by: Madhav Bhargava <[email protected]> Signed-off-by: Kang Zhang <[email protected]>

remove schedulerName in PCS level and refine phases

ef9a34e

Signed-off-by: Kang Zhang <[email protected]>

kangclzjc force-pushed the grep_scheduler_backend branch from 16e1c09 to ef9a34e Compare February 19, 2026 01:24

clarify the gangscheduling config

74d8fa2

Signed-off-by: Kang Zhang <[email protected]>

unmarshall reviewed Feb 19, 2026

View reviewed changes


		For detailed lifecycle flow, see [PodGang Lifecycle Changes](#podgang-lifecycle-changes).

		### Backend Interface Definition


		### Backend Manager

		The manager initializes scheduler backends: the kube-scheduler backend is always created and active; additional backends are created from OperatorConfiguration profiles. It provides access by name and a default:


		Unit tests will be implemented for all framework related components:

		Backend Interface and Registry (`operator/internal/schedulerBackend/`)


		#### E2E Tests

		All existing e2e tests should be passed based on all supported schedulers.

Conversation

kangclzjc commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a API change?

Additional documentation e.g., enhancement proposals, usage docs, etc.:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sanjaychatterjee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kangclzjc commented Jan 27, 2026 •

edited

Loading