Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pkg/scheduler/cache/event_handlers.go
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,15 @@ func (sc *SchedulerCache) SyncNode(nodeName string) error {
}

if !sc.nodeCanAddCache(node) {
// If node exists in cache but no longer matches node selector, remove it
if _, exists := sc.Nodes[nodeName]; exists {
if deleteErr := sc.RemoveNode(nodeName); deleteErr != nil {
klog.Errorf("Failed to remove node <%s> from cache that no longer matches node selector: %s",
nodeName, deleteErr.Error())
return deleteErr
}
klog.V(3).Infof("Node <%s> no longer matches node selector, removed from cache.", nodeName)
}
Comment on lines +624 to +632
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential race condition here. The check at line 625 reads sc.Nodes[nodeName] without holding the lock, but RemoveNode at line 626 acquires the lock and modifies sc.Nodes. Since there can be multiple concurrent node workers (see cache.go:822), another goroutine could remove the node between the check and the RemoveNode call, causing RemoveNode to return an error (see event_handlers.go:540-541). The exists check should be moved inside a lock or removed entirely since RemoveNode already handles the case where the node doesn't exist (it returns an error that could be ignored if it's already been removed).

Suggested change
// If node exists in cache but no longer matches node selector, remove it
if _, exists := sc.Nodes[nodeName]; exists {
if deleteErr := sc.RemoveNode(nodeName); deleteErr != nil {
klog.Errorf("Failed to remove node <%s> from cache that no longer matches node selector: %s",
nodeName, deleteErr.Error())
return deleteErr
}
klog.V(3).Infof("Node <%s> no longer matches node selector, removed from cache.", nodeName)
}
// Node no longer matches node selector, ensure it is removed from cache
if deleteErr := sc.RemoveNode(nodeName); deleteErr != nil {
klog.Errorf("Failed to remove node <%s> from cache that no longer matches node selector: %s",
nodeName, deleteErr.Error())
return deleteErr
}
klog.V(3).Infof("Node <%s> no longer matches node selector, removed from cache.", nodeName)

Copilot uses AI. Check for mistakes.
Comment on lines +624 to +632
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new behavior added in this PR (removing a node from cache when it no longer matches the node selector) is not covered by any test case. The existing TestSchedulerCache_SyncNode test has cases for "Node added to cache" and "Node not added to cache", but no test case for "Node removed from cache when selector no longer matches". This scenario should be tested with a node that is first added to the cache (when it matches the selector) and then synced again with a different node state (without the matching label).

Copilot uses AI. Check for mistakes.
Comment on lines +624 to +632
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The added code checks if a node exists in the cache but no longer matches the node selector, and if so, removes it. However, there's a potential race condition here. If sc.RemoveNode takes a significant amount of time, another goroutine might add the node back to the cache before the removal is complete. This could lead to the node being briefly removed and then re-added, potentially causing scheduling inconsistencies.

To mitigate this, consider using a mutex or a channel to synchronize access to the sc.Nodes map during the removal process. This will ensure that only one goroutine can modify the cache at a time, preventing race conditions.

return nil
}
nodeCopy := node.DeepCopy()
Expand Down
Loading