MetalLB Version
operator v0.13.11
metallb v0.13.10
OS : Talos 1.3.7
Kubernetes : 1.24.9
CNI : Cilium 1.12.4
After upgrading from operator v0.13.4/metallb v0.13.5 to operator v0.13.10/metallb v0.13.11, the resource daemonset.apps/speaker went down and restarted after few minutes.
[eric@macross ~]$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/controller-db6f6ff7d-zjfcr 1/1 Running 0 70s
pod/metallb-operator-controller-manager-6fd4d656f-tx2hj 1/1 Running 0 15m
pod/metallb-operator-webhook-server-588bbdf874-g2jsd 1/1 Running 0 2m53s
pod/speaker-2tvk6 0/1 CrashLoopBackOff 33 (3m3s ago) 3h36m
pod/speaker-5v2sp 0/1 CrashLoopBackOff 33 (2m18s ago) 3h36m
pod/speaker-p7spx 0/1 CrashLoopBackOff 33 (3m59s ago) 20h
pod/speaker-wrs8n 0/1 CrashLoopBackOff 33 (3m59s ago) 3h37m
pod/speaker-xfj7v 0/1 CrashLoopBackOff 33 (3m32s ago) 3h36m
Looking at the logs of one of the pod, errors on get and watch configmaps appears and the speacker pod went down.
W0825 11:41:31.682290 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:41:31.682339 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
W0825 11:41:33.520445 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:41:33.520473 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
W0825 11:41:39.101431 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:41:39.101463 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
W0825 11:41:46.581417 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:41:46.581469 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
W0825 11:42:03.218915 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:42:03.219009 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
[...]
W0825 11:42:37.744778 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:42:37.744806 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"Could not wait for Cache to sync","controller":"node","controllerGroup":"","controllerKind":"Node","error":"failed to wait for node caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:211\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:242\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"Could not wait for Cache to sync","controller":"service","controllerGroup":"","controllerKind":"Service","error":"failed to wait for service caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:211\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:242\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"}
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"Could not wait for Cache to sync","controller":"bgppeer","controllerGroup":"metallb.io","controllerKind":"BGPPeer","error":"failed to wait for bgppeer caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:211\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:242\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Stopping and waiting for leader election runnables"}
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"error received after stop sequence was engaged","error":"failed to wait for service caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/internal.go:555"}
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"error received after stop sequence was engaged","error":"failed to wait for bgppeer caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/internal.go:555"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Stopping and waiting for caches"}
{"level":"error","ts":"2023-08-25T11:43:30Z","logger":"controller-runtime.source","msg":"failed to get informer from cache","error":"Timeout: failed waiting for *v1.ConfigMap Informer to sync","stacktrace":"sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/source/source.go:148\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.0/pkg/util/wait/wait.go:235\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.0/pkg/util/wait/wait.go:582\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.0/pkg/util/wait/wait.go:547\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/source/source.go:136"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Wait completed, proceeding to shutdown the manager"}
{"caller":"main.go:201","error":"failed to wait for node caches to sync: timed out waiting for cache to be synced","level":"error","msg":"failed to run k8s client","op":"startup","ts":"2023-08-25T11:43:30Z"}
Initial installation and upgrade were both done using the manifest.
As a workaround, we added in the clusterrole metallb-system:speaker the autorization to get/list/watch the resource configmaps.
[eric@macross ~]$ kubectl get clusterrole metallb-system:speaker -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"app":"metallb"},"name":"metallb-system:speaker"},"rules":[{"apiGroups":[""],"resources":["services","endpoints","nodes","namespaces"],"verbs":["get","list","watch"]},{"apiGroups":["discovery.k8s.io"],"resources":["endpointslices"],"verbs":["get","list","watch"]},{"apiGroups":[""],"resources":["events"],"verbs":["create","patch"]},{"apiGroups":["policy"],"resourceNames":["speaker"],"resources":["podsecuritypolicies"],"verbs":["use"]}]}
creationTimestamp: "2022-09-13T07:16:45Z"
labels:
app: metallb
name: metallb-system:speaker
resourceVersion: "132426474"
uid: 12d48a2c-8274-49f7-8e51-aed128a7b112
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- nodes
- namespaces
- configmaps
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- policy
resourceNames:
- speaker
resources:
- podsecuritypolicies
verbs:
- use
After this modification and a full restart, everything is now working perfectly.
[eric@macross ~]$ kubectl get po -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
controller-db6f6ff7d-zjfcr 1/1 Running 0 24m 10.19.3.207 kw905-vso-pr <none> <none>
metallb-operator-controller-manager-6fd4d656f-tx2hj 1/1 Running 0 39m 10.19.3.131 kw905-vso-pr <none> <none>
metallb-operator-webhook-server-588bbdf874-g2jsd 1/1 Running 0 26m 10.19.3.208 kw905-vso-pr <none> <none>
speaker-5vqsf 1/1 Running 0 15m 10.4.205.104 kw902-vso-pr <none> <none>
speaker-8jjhv 1/1 Running 0 14m 10.4.205.103 kw901-vso-pr <none> <none>
speaker-jlz9b 1/1 Running 0 15m 10.4.205.107 kw905-vso-pr <none> <none>
speaker-jtcxx 1/1 Running 0 15m 10.4.205.106 kw904-vso-pr <none> <none>
speaker-nlwxq 1/1 Running 0 15m 10.4.205.105 kw903-vso-pr <none> <none>
[eric@macross ~]$ kubectl logs speaker-jtcxx
[...]
{"level":"info","ts":"2023-08-25T11:47:09Z","msg":"Starting workers","controller":"service","controllerGroup":"","controllerKind":"Service","worker count":1}
{"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2023-08-25T11:47:09Z"}
{"level":"info","ts":"2023-08-25T11:47:09Z","msg":"Starting workers","controller":"node","controllerGroup":"","controllerKind":"Node","worker count":1}
{"level":"info","ts":"2023-08-25T11:47:09Z","msg":"Starting workers","controller":"bgppeer","controllerGroup":"metallb.io","controllerKind":"BGPPeer","worker count":1}
{"caller":"node_controller.go:46","controller":"NodeReconciler","level":"info","start reconcile":"/km901-vso-pr","ts":"2023-08-25T11:47:09Z"}
{"caller":"config_controller.go:59","controller":"ConfigReconciler","level":"info","start reconcile":"/kw905-vso-pr","ts":"2023-08-25T11:47:09Z"}
{"caller":"node_controller.go:69","controller":"NodeReconciler","end reconcile":"/km901-vso-pr","level":"info","ts":"2023-08-25T11:47:09Z"}
[...]
{"caller":"config_controller.go:59","controller":"ConfigReconciler","level":"info","start reconcile":"/km902-vso-pr","ts":"2023-08-25T11:47:09Z"}
{"caller":"speakerlist.go:310","level":"info","msg":"node event - forcing sync","node addr":"10.4.205.105","node event":"NodeJoin","node name":"kw903-vso-pr","ts":"2023-08-25T11:47:09Z"}
{"caller":"main.go:374","event":"serviceAnnounced","ips":["10.4.207.211"],"level":"info","msg":"service has IP, announcing","pool":"vip-pool","protocol":"layer2","ts":"2023-08-25T11:47:09Z"}
{"caller":"service_controller_reload.go:104","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2023-08-25T11:47:09Z"}
[...]
{"caller":"speakerlist.go:310","level":"info","msg":"node event - forcing sync","node addr":"10.4.205.103","node event":"NodeJoin","node name":"kw901-vso-pr","ts":"2023-08-25T11:47:40Z"}
{"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2023-08-25T11:47:40Z"}
{"caller":"main.go:418","event":"serviceWithdrawn","ip":["10.4.207.209"],"ips":["10.4.207.209"],"level":"info","msg":"withdrawing service announcement","pool":"vip-pool","protocol":"layer2","reason":"notOwner","ts":"2023-08-25T11:47:40Z"}
{"caller":"main.go:374","event":"serviceAnnounced","ips":["10.4.207.211"],"level":"info","msg":"service has IP, announcing","pool":"vip-pool","protocol":"layer2","ts":"2023-08-25T11:47:40Z"}
{"caller":"service_controller_reload.go:104","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2023-08-25T11:47:40Z"}
[eric@macross ~]$ curl -Is http://argocd.tooling-nms-preprod.valentine.sfr.com/ | head -n 1
HTTP/1.1 200 OK
The diff between the original manifest and the one we used for upgrade.
[eric@macross metallb]$ diff metallb-operator.yaml metallb-operator-0.13.10.yaml
3587c3587
< value: quay.io/metallb/speaker:v0.13.9
---
> value: quay.io/metallb/speaker:v0.13.10
3589c3589
< value: quay.io/metallb/controller:v0.13.9
---
> value: quay.io/metallb/controller:v0.13.10
3664c3664
< image: quay.io/metallb/controller:v0.13.9
---
> image: quay.io/metallb/controller:v0.13.10
4212a4213
> - configmaps
MetalLB Version
operator v0.13.11
metallb v0.13.10
OS : Talos 1.3.7
Kubernetes : 1.24.9
CNI : Cilium 1.12.4
After upgrading from operator v0.13.4/metallb v0.13.5 to operator v0.13.10/metallb v0.13.11, the resource daemonset.apps/speaker went down and restarted after few minutes.
Looking at the logs of one of the pod, errors on get and watch configmaps appears and the speacker pod went down.
Initial installation and upgrade were both done using the manifest.
As a workaround, we added in the clusterrole metallb-system:speaker the autorization to get/list/watch the resource configmaps.
After this modification and a full restart, everything is now working perfectly.
The diff between the original manifest and the one we used for upgrade.