Description
Observed Behavior:
We recently had a kubelet fail due to memory issues. The node (EKS) was correctly marked as NotReady by Kubernetes and all its pods transitioned to Terminating state. Replacement pods were scheduled, but pods with RWO volumes attached could not start on new nodes because the volumes were never released from the broken node.
My understanding is that because the underlying OS kept running, AWS did not detect an issue with the EC2 instance, so Karpenter never received a signal to delete the NodeClaim (which showed no indication that anything was wrong). The cluster remains stuck in this state indefinitely without manual intervention.
Expected Behavior:
Node-level health issues (e.g. NotReady status) should be propagated to the NodeClaim, so that broken nodes are terminated and their resources (volumes etc.) are released — even when there is no underlying hardware or OS failure.
Reproduction Steps:
- Create an EKS cluster managed by Karpenter
- Deploy workloads with RWO volumes bound to pods
- Simulate a kubelet failure on one of the nodes (e.g.
systemctl stop kubelet)
- Observe that:
- The node transitions to
NotReady
- Pods enter
Terminating state but are never fully evicted
- The NodeClaim is not deleted by Karpenter
- RWO volumes remain bound to the broken node, blocking replacement pods
NodePool Configuration:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-cpu
operator: Gt
values: ["4"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
expireAfter: 720h
startupTaints:
- key: node.cilium.io/agent-not-ready
value: "true"
effect: NoExecute
limits:
cpu: 500
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 15m
budgets:
- nodes: "10%"
reasons:
- "Empty"
- nodes: "10%"
reasons:
- "Drifted"
- "Underutilized" #...
Note that we do not have terminationGracePeriod set, however I think that involuntary shutdowns should be handled separately from graceful termination. I am also aware of Karpenter's efforts regarding node repair, but there will always be errors that you cannot recover from, so I believe Karpenter should be able to handle those.
Versions:
- Chart Version: 1.6.5
- Kubernetes Version: v1.33.8-eks-f69f56f
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
Observed Behavior:
We recently had a kubelet fail due to memory issues. The node (EKS) was correctly marked as
NotReadyby Kubernetes and all its pods transitioned toTerminatingstate. Replacement pods were scheduled, but pods with RWO volumes attached could not start on new nodes because the volumes were never released from the broken node.My understanding is that because the underlying OS kept running, AWS did not detect an issue with the EC2 instance, so Karpenter never received a signal to delete the NodeClaim (which showed no indication that anything was wrong). The cluster remains stuck in this state indefinitely without manual intervention.
Expected Behavior:
Node-level health issues (e.g.
NotReadystatus) should be propagated to the NodeClaim, so that broken nodes are terminated and their resources (volumes etc.) are released — even when there is no underlying hardware or OS failure.Reproduction Steps:
systemctl stop kubelet)NotReadyTerminatingstate but are never fully evictedNodePool Configuration:
Note that we do not have
terminationGracePeriodset, however I think that involuntary shutdowns should be handled separately from graceful termination. I am also aware of Karpenter's efforts regarding node repair, but there will always be errors that you cannot recover from, so I believe Karpenter should be able to handle those.Versions: