Skip to content

Bug: kubeslice-dns and kubeslice-operator still pending while installation of kubeslice in worker clusters on gke standard tier due to pod affinity error Β #103

@sanjay7178

Description

@sanjay7178

πŸ“œ Description

I tried setting up kubeslice via manual yaml instead of topology , I registered controller and worker clusters on my gke clusters , so I figured few things before creating slice . the kubeslice-dns and kubeslice-operator pods are keep on showing status as pending in kubeslice-system namespace in both of the worker clusters

πŸ‘Ÿ Reproduction steps

  • Create 3 GKE Standard tier clusters on same or different regions
  • Manage to have a topology file of cluster and workers with its kubeconfig
  • Finally run the installation using kubeslice-cli

πŸ‘ Expected behavior

Kubeslice needed to be installed on the gke clusters

πŸ‘Ž Actual Behavior

Doesn't work at all

🐚 Relevant log output

These are the logs from one of the worker cluster while installation of kubeslice via helm upgrade

(base) ➜  yaml-cofing kubectl get pods --all-namespaces                                      
Found existing alias for "kubectl get pods --all-namespaces". You should use: "kgpa"
NAMESPACE                      NAME                                                    READY   STATUS      RESTARTS   AGE
gke-managed-cim                kube-state-metrics-0                                    2/2     Running     0          42m
gmp-system                     collector-7c77w                                         2/2     Running     0          41m
gmp-system                     collector-jlmlm                                         2/2     Running     0          41m
gmp-system                     collector-lhg2q                                         2/2     Running     0          41m
gmp-system                     gmp-operator-5899b68d4b-kf9lc                           1/1     Running     0          41m
kube-system                    event-exporter-gke-6d7c4dcf79-77zkz                     2/2     Running     0          42m
kube-system                    fluentbit-gke-9hbcd                                     3/3     Running     0          41m
kube-system                    fluentbit-gke-snnm2                                     3/3     Running     0          41m
kube-system                    fluentbit-gke-w6gtm                                     3/3     Running     0          41m
kube-system                    gke-metrics-agent-sngx6                                 2/2     Running     0          41m
kube-system                    gke-metrics-agent-sxplv                                 2/2     Running     0          41m
kube-system                    gke-metrics-agent-ts9s4                                 2/2     Running     0          41m
kube-system                    konnectivity-agent-8596fd4d6f-gbk89                     2/2     Running     0          40m
kube-system                    konnectivity-agent-8596fd4d6f-l4kbz                     2/2     Running     0          41m
kube-system                    konnectivity-agent-8596fd4d6f-thjck                     2/2     Running     0          40m
kube-system                    konnectivity-agent-autoscaler-57cb65694f-6v2t8          1/1     Running     0          41m
kube-system                    kube-dns-67c79cd964-jh57p                               4/4     Running     0          40m
kube-system                    kube-dns-67c79cd964-pplkw                               4/4     Running     0          42m
kube-system                    kube-dns-autoscaler-69778b8cfb-rwlft                    1/1     Running     0          41m
kube-system                    kube-proxy-gke-ks-worker-1-default-pool-24d47686-zdxl   1/1     Running     0          41m
kube-system                    kube-proxy-gke-ks-worker-1-default-pool-7c236418-7bx6   1/1     Running     0          40m
kube-system                    kube-proxy-gke-ks-worker-1-default-pool-c269c09f-c4d0   1/1     Running     0          40m
kube-system                    l7-default-backend-dcfd7d6bb-xjcll                      1/1     Running     0          41m
kube-system                    metrics-server-v1.33.0-5d6c8599c6-sb66g                 1/1     Running     0          41m
kube-system                    pdcsi-node-fhsfn                                        2/2     Running     0          41m
kube-system                    pdcsi-node-qwltv                                        2/2     Running     0          41m
kube-system                    pdcsi-node-stq57                                        2/2     Running     0          41m
kubeslice-nsm-webhook-system   nsm-admission-webhook-k8s-58b6d9bf6b-6265q              1/1     Running     0          25m
kubeslice-nsm-webhook-system   nsm-admission-webhook-k8s-58b6d9bf6b-vhfkg              1/1     Running     0          25m
kubeslice-system               forwarder-kernel-759g8                                  0/1     Pending     0          25m
kubeslice-system               forwarder-kernel-frvck                                  1/1     Running     0          25m
kubeslice-system               forwarder-kernel-lfg9f                                  1/1     Running     0          25m
kubeslice-system               kubeslice-dns-79b7d7fbf4-25h6z                          0/1     Pending     0          25m
kubeslice-system               kubeslice-install-crds-4t47r                            0/1     Completed   0          26m
kubeslice-system               kubeslice-operator-5d9c956cd6-8ww2b                     0/2     Pending     0          25m
kubeslice-system               nsm-install-crds-gfhfh                                  0/1     Completed   0          25m
kubeslice-system               nsmgr-4cb9x                                             2/2     Running     0          25m
kubeslice-system               nsmgr-4j52t                                             2/2     Running     0          25m
kubeslice-system               nsmgr-f9x5b                                             0/2     Pending     0          25m
kubeslice-system               registry-k8s-979455d6d-8rkwc                            1/1     Running     0          25m
kubeslice-system               spire-install-clusterid-cr-9vgmn                        0/1     Completed   0          25m
kubeslice-system               spire-install-crds-2kjkd                                0/1     Completed   0          25m
spire                          spiffe-csi-driver-bhxgt                                 2/2     Running     0          25m
spire                          spiffe-csi-driver-jz2jm                                 2/2     Running     0          25m
spire                          spiffe-csi-driver-pz6gm                                 2/2     Running     0          25m
spire                          spire-agent-66mtz                                       1/1     Running     0          25m
spire                          spire-agent-9txkb                                       1/1     Running     0          25m
spire                          spire-agent-rsn9g                                       1/1     Running     0          25m
spire                          spire-server-0                                          2/2     Running     0          25m
(base) ➜  yaml-cofing kubectl describe pods/kubeslice-dns-79b7d7fbf4-25h6z  -n  kubeslice-system 
Found existing alias for "kubectl". You should use: "k"
Name:             kubeslice-dns-79b7d7fbf4-25h6z
Namespace:        kubeslice-system
Priority:         0
Service Account:  kubeslice-dns
Node:             <none>
Labels:           app=kubeslice-dns
                  kubeslice.io/pod-type=dns
                  pod-template-hash=79b7d7fbf4
Annotations:      cloud.google.com/cluster_autoscaler_unhelpable_since: 2025-08-07T13:16:08+0000
                  cloud.google.com/cluster_autoscaler_unhelpable_until: Inf
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/kubeslice-dns-79b7d7fbf4
Containers:
  dns:
    Image:       docker.io/aveshasystems/dns:0.1.4
    Ports:       1053/UDP, 1053/TCP
    Host Ports:  0/UDP, 0/TCP
    Limits:
      cpu:     50m
      memory:  128Mi
    Requests:
      cpu:        10m
      memory:     64Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k54t8 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-k54t8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kubeslice.io/node-type=gateway:NoSchedule
                             kubeslice.io/node-type=gateway:NoExecute
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                 From                Message
  ----     ------             ----                ----                -------
  Warning  FailedScheduling   30m                 default-scheduler   0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Warning  FailedScheduling   20m (x2 over 25m)   default-scheduler   0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Normal   NotTriggerScaleUp  11s (x93 over 30m)  cluster-autoscaler  Pod didn't trigger scale-up:
(base) ➜  yaml-cofing kubectl describe pods/kubeslice-operator-5d9c956cd6-8ww2b   -n  kubeslice-system
Found existing alias for "kubectl". You should use: "k"
Name:             kubeslice-operator-5d9c956cd6-8ww2b
Namespace:        kubeslice-system
Priority:         0
Service Account:  kubeslice-controller-manager
Node:             <none>
Labels:           control-plane=controller-manager
                  pod-template-hash=5d9c956cd6
                  spoke-cluster=gke_graphic-transit-458312-f7_us-east4_ks-worker-1
Annotations:      cloud.google.com/cluster_autoscaler_unhelpable_since: 2025-08-07T13:16:20+0000
                  cloud.google.com/cluster_autoscaler_unhelpable_until: Inf
                  kubectl.kubernetes.io/default-container: manager
                  prometheus.io/port: 8080
                  prometheus.io/scrape: true
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/kubeslice-operator-5d9c956cd6
Containers:
  kube-rbac-proxy:
    Image:      gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
    Port:       8443/TCP
    Host Port:  0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7xn8h (ro)
  manager:
    Image:      docker.io/aveshasystems/worker-operator:1.4.0
    Port:       <none>
    Host Port:  <none>
    Command:
      /manager
    Args:
      --health-probe-bind-address=:8081
      --metrics-bind-address=:8080
      --leader-elect
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:      10m
      memory:   64Mi
    Liveness:   http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:  http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      LOG_LEVEL:                            INFO
      HUB_HOST_ENDPOINT:                    <set to the key 'endpoint' in secret 'kubeslice-hub'>   Optional: false
      HUB_PROJECT_NAMESPACE:                <set to the key 'namespace' in secret 'kubeslice-hub'>  Optional: false
      CLUSTER_NAME:                         gke_graphic-transit-458312-f7_us-east4_ks-worker-1
      AVESHA_VL3_ROUTER_IMAGE:              docker.io/aveshasystems/cmd-nse-vl3:1.0.6
      AVESHA_VL3_ROUTER_PULLPOLICY:         IfNotPresent
      AVESHA_VL3_SIDECAR_IMAGE:             docker.io/aveshasystems/kubeslice-router-sidecar:1.4.6
      AVESHA_VL3_SIDECAR_IMAGE_PULLPOLICY:  IfNotPresent
      CLUSTER_ENDPOINT:                     https://34.86.91.118
      AVESHA_GW_SIDECAR_IMAGE:              docker.io/aveshasystems/gw-sidecar:1.0.3
      AVESHA_GW_SIDECAR_IMAGE_PULLPOLICY:   IfNotPresent
      AVESHA_OPENVPN_SERVER_IMAGE:          docker.io/aveshasystems/openvpn-server.alpine:1.0.4
      AVESHA_OPENVPN_SERVER_PULLPOLICY:     IfNotPresent
      AVESHA_OPENVPN_CLIENT_IMAGE:          docker.io/aveshasystems/openvpn-client.alpine:1.0.4
      AVESHA_OPENVPN_CLIENT_PULLPOLICY:     IfNotPresent
      AVESHA_SLICE_GW_EDGE_IMAGE:           aveshasystems/slicegw-edge:1.0.6
      WORKER_INSTALLER_IMAGE:               docker.io/aveshasystems/worker-installer:1.5.0
    Mounts:
      /etc/webhook/certs from webhook-certs (ro)
      /var/run/secrets/kubernetes.io/hub-serviceaccount from hub-secret (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7xn8h (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kubeslice-worker-event-schema-conf:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kubeslice-worker-event-schema-conf
    Optional:  false
  webhook-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubeslice-admission-webhook-certs
    Optional:    false
  hub-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubeslice-hub
    Optional:    false
  kube-api-access-7xn8h:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                 From                Message
  ----     ------             ----                ----                -------
  Warning  FailedScheduling   31m                 default-scheduler   0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Warning  FailedScheduling   20m (x2 over 25m)   default-scheduler   0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Normal   NotTriggerScaleUp  22s (x91 over 30m)  cluster-autoscaler  Pod didn't trigger scale-up:

Version

None

πŸ–₯️ What operating system are you seeing the problem on?

No response

βœ… Proposed Solution

for temporary fix I Labeled all nodes as gateway : kubectl label nodes --all kubeslice.io/node-type=gateway --overwrite on every worker cluster

πŸ‘€ Have you spent some time to check if this issue has been raised before?

  • I checked and didn't find any similar issue

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions