-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Description
What happened?
Using the latest kubespray release (v2.30.0), the first control node is not uncordoned after the uncordoning steps when running the kubernetes_sigs.kubespray.upgrade_cluster playbook. I can confirm that in the same playbook on the previous release (version v2.29.1), the uncordoning works.
We run the upgrade playbook with the following flags, so we had good visibility on what is happening in the cluster etc.
ansible-playbook -i "./inventory/dev-1.yaml" ./playbooks/upgrade_cluster.yaml -e "upgrade_node_confirm=true" -e "upgrade_node_post_upgrade_confirm=true"
With these flags confirmation is requested before the uncordoning and start of next node upgrades. After confirming the uncordoning of the control-01 node I monitored the cluster but nothing happend. The cordon was never removed.
Continuing the upgrade with the other nodes the uncordoning for those nodes DOES work. So the problem is really only occurring for the first control node in our case.
Here's the playbook output for the first node uncordoning steps, the steps are skipped, even though the node was cordoned in the same play and the node is marked with SchedulingDisabled:
TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Confirm node uncordon] *****************************************************************************************************************************************************************************************************************************************************************
[kubernetes_sigs.kubespray.upgrade/post-upgrade : Confirm node uncordon]
Ready to uncordon control-node-01?:
ok: [control-node-01]
TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Wait before uncordoning node] **********************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]
TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Run post upgrade hooks before uncordon] ************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]
TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Uncordon node] *************************************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]
What did you expect to happen?
Control-node-01 to be uncordoned after giving approval to the playbook for the uncordoning.
How can we reproduce it (as minimally and precisely as possible)?
Upgrade a cluster using the latest release with the flags I've provided above (not sure if flags are required to reproduce). When the first control node should be uncordoned, it never actually does so.
OS
Ubuntu 24
Version of Ansible
ansible [core 2.17.7]
Version of Python
3.12.0
Version of Kubespray (commit)
v2.30.0
Network plugin used
custom_cni
Full inventory with variables
Too sensitive to share
Command used to invoke ansible
ansible-playbook -i "./inventory/dev-1.yaml" ./playbooks/upgrade_cluster.yaml -e "upgrade_node_confirm=true" -e "upgrade_node_post_upgrade_confirm=true"
Output of ansible run
TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Confirm node uncordon] *****************************************************************************************************************************************************************************************************************************************************************
[kubernetes_sigs.kubespray.upgrade/post-upgrade : Confirm node uncordon]
Ready to uncordon control-node-01?:
ok: [control-node-01]
TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Wait before uncordoning node] **********************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]
TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Run post upgrade hooks before uncordon] ************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]
TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Uncordon node] *************************************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]
Anything else we need to know
No response