Skip to content

First (control) node not uncordoned after uncordoning steps in upgrade cluster playbook #13030

@robinvalk

Description

@robinvalk

What happened?

Using the latest kubespray release (v2.30.0), the first control node is not uncordoned after the uncordoning steps when running the kubernetes_sigs.kubespray.upgrade_cluster playbook. I can confirm that in the same playbook on the previous release (version v2.29.1), the uncordoning works.

We run the upgrade playbook with the following flags, so we had good visibility on what is happening in the cluster etc.

ansible-playbook -i "./inventory/dev-1.yaml" ./playbooks/upgrade_cluster.yaml -e "upgrade_node_confirm=true" -e "upgrade_node_post_upgrade_confirm=true"

With these flags confirmation is requested before the uncordoning and start of next node upgrades. After confirming the uncordoning of the control-01 node I monitored the cluster but nothing happend. The cordon was never removed.

Continuing the upgrade with the other nodes the uncordoning for those nodes DOES work. So the problem is really only occurring for the first control node in our case.

Here's the playbook output for the first node uncordoning steps, the steps are skipped, even though the node was cordoned in the same play and the node is marked with SchedulingDisabled:

TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Confirm node uncordon] *****************************************************************************************************************************************************************************************************************************************************************
[kubernetes_sigs.kubespray.upgrade/post-upgrade : Confirm node uncordon]
Ready to uncordon control-node-01?:
ok: [control-node-01]

TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Wait before uncordoning node] **********************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]

TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Run post upgrade hooks before uncordon] ************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]

TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Uncordon node] *************************************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]

What did you expect to happen?

Control-node-01 to be uncordoned after giving approval to the playbook for the uncordoning.

How can we reproduce it (as minimally and precisely as possible)?

Upgrade a cluster using the latest release with the flags I've provided above (not sure if flags are required to reproduce). When the first control node should be uncordoned, it never actually does so.

OS

Ubuntu 24

Version of Ansible

ansible [core 2.17.7]

Version of Python

3.12.0

Version of Kubespray (commit)

v2.30.0

Network plugin used

custom_cni

Full inventory with variables

Too sensitive to share

Command used to invoke ansible

ansible-playbook -i "./inventory/dev-1.yaml" ./playbooks/upgrade_cluster.yaml -e "upgrade_node_confirm=true" -e "upgrade_node_post_upgrade_confirm=true"

Output of ansible run

TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Confirm node uncordon] *****************************************************************************************************************************************************************************************************************************************************************
[kubernetes_sigs.kubespray.upgrade/post-upgrade : Confirm node uncordon]
Ready to uncordon control-node-01?:
ok: [control-node-01]

TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Wait before uncordoning node] **********************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]

TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Run post upgrade hooks before uncordon] ************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]

TASK [kubernetes_sigs.kubespray.upgrade/post-upgrade : Uncordon node] *************************************************************************************************************************************************************************************************************************************************************************
skipping: [control-node-01]

Anything else we need to know

No response

Metadata

Metadata

Assignees

Labels

Ubuntu 24kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions