Which component are you using?:
We're using Kueue with DWS on GKE (classic) to get nodes with GPUs provisioned.
What k8s version are you using (kubectl version)?:
kubectl version Output
$ kubectl version
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.34.4-gke.1193000
WARNING: version difference between client (1.32) and server (1.34) exceeds the supported minor version skew of +/-1
What environment is this in?:
GKE
What did you expect to happen?:
Have a look at the following provreq that was created by queue. It has a locationConstraint set to europe-west4-a (this is a deliberate choice). I'd expect the autoscaler to respect this constraint and provision the node in this zone.
What happened instead?:
As you can see in the status, the SelectedZone was europe-west4-b. The problem is that this is set on Kueue's resource flavor which results in a nodeSelector (topology.kubernetes.io/zone: europe-west4-a) being added to the pod. Since the node is provisioned in europe-west4-b but the pod requires europe-west4-a, the pod never gets scheduled and the capacity booking for the provisioning request expires.
Privisioning Request manifest
apiVersion: autoscaling.x-k8s.io/v1
kind: ProvisioningRequest
metadata:
creationTimestamp: "2026-04-17T13:55:48Z"
generation: 1
labels:
kueue.x-k8s.io/managed: "true"
name: pod-fjjfwxiq-1f00b-europe-west4-a-1
namespace: production
ownerReferences:
- apiVersion: kueue.x-k8s.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: Workload
name: pod-fjjfwxiq-1f00b
uid: 3b684cd6-ccc2-45ea-9b35-905ab16cb326
resourceVersion: "1776437424362047021"
uid: 3d49e75f-7c8f-47c5-b33d-d7333c2ea464
spec:
parameters:
locationConstraint: europe-west4-a
podSets:
- count: 1
podTemplateRef:
name: ppt-pod-fjjfwxiq-1f00b-europe-west4-a-1-main
provisioningClassName: queued-provisioning.gke.io
status:
conditions:
- lastTransitionTime: "2026-04-17T13:56:07Z"
message: Provisioning Request was successfully queued.
observedGeneration: 1
reason: SuccessfullyQueued
status: "True"
type: Accepted
- lastTransitionTime: "2026-04-17T14:40:24Z"
message: Provisioning Request was successfully provisioned.
observedGeneration: 1
reason: Provisioned
status: "True"
type: Provisioned
- lastTransitionTime: "2026-04-17T14:50:24Z"
message: Capacity booking for the Provisioning Request has expired and the nodes
are now candidates for scale down when underutilized.
observedGeneration: 1
reason: BookingExpired
status: "True"
type: BookingExpired
provisioningClassDetails:
AcceleratorType: nvidia-tesla-a100
NodeGroupName: gke-saas-gke-cluster-nap-a2-highgpu-1-0a2fd8a7-grp
NodePoolAutoProvisioned: "true"
NodePoolName: nap-a2-highgpu-1g-gpu1-rlp1cspi
PodTemplateName: ppt-pod-fjjfwxiq-1f00b-europe-west4-a-1-main
ProvisioningMode: resize_request
ResizeRequestName: gke-production-pod-fjjfwxiq-1f00b-e-a3cc38d07087bd9a
SelectedZone: europe-west4-b
Node selectors on Pod
nodeSelector:
autoscaling.gke.io/provisioning-request: gke-production-pod-fjjfwxiq-1f00b-e-a3cc38d07087bd9a
cloud.google.com/gke-accelerator: nvidia-tesla-a100
topology.kubernetes.io/zone: europe-west4-a
Kueue resource flavor
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
annotations:
argocd.argoproj.io/tracking-id: saas-gke-cluster-blue-kueue:kueue.x-k8s.io/ResourceFlavor:kueue-system/nvidia-tesla-a100-flavor
creationTimestamp: "2024-10-11T10:04:08Z"
finalizers:
- kueue.x-k8s.io/resource-in-use
generation: 2
name: nvidia-tesla-a100-flavor
resourceVersion: "1775716960047423022"
uid: 3ac1dab6-66a3-4f49-96d1-9761ce87b71d
spec:
nodeLabels:
cloud.google.com/gke-accelerator: nvidia-tesla-a100
topology.kubernetes.io/zone: europe-west4-a
Kueue ProvisioningRequestConfig
apiVersion: kueue.x-k8s.io/v1beta1
kind: ProvisioningRequestConfig
metadata:
name: europe-west4-a
spec:
provisioningClassName: queued-provisioning.gke.io
managedResources:
- nvidia.com/gpu
parameters:
locationConstraint: "europe-west4-a"
How to reproduce it (as minimally and precisely as possible):
It's not easy to reproduce as most of the time, we actually get a node in the requested zone. Maybe to reproduce it, it would be better to set the constraint to europe-west4-b and then get a node provisioned in -a.
Anything else we need to know?:
Which component are you using?:
We're using Kueue with DWS on GKE (classic) to get nodes with GPUs provisioned.
What k8s version are you using (
kubectl version)?:kubectl versionOutputWhat environment is this in?:
GKE
What did you expect to happen?:
Have a look at the following provreq that was created by queue. It has a
locationConstraintset toeurope-west4-a(this is a deliberate choice). I'd expect the autoscaler to respect this constraint and provision the node in this zone.What happened instead?:
As you can see in the status, the
SelectedZonewaseurope-west4-b. The problem is that this is set on Kueue's resource flavor which results in anodeSelector(topology.kubernetes.io/zone: europe-west4-a) being added to the pod. Since the node is provisioned ineurope-west4-bbut the pod requireseurope-west4-a, the pod never gets scheduled and the capacity booking for the provisioning request expires.Privisioning Request manifest
Node selectors on Pod
Kueue resource flavor
Kueue ProvisioningRequestConfig
How to reproduce it (as minimally and precisely as possible):
It's not easy to reproduce as most of the time, we actually get a node in the requested zone. Maybe to reproduce it, it would be better to set the constraint to
europe-west4-band then get a node provisioned in-a.Anything else we need to know?: