-
Notifications
You must be signed in to change notification settings - Fork 672
Description
Description
We encountered a scenario where a managed VPC was updated outside of CAPA to add IPv6 CIDR blocks (to enable communication with an IPv6-configured AWS service). After the change, CAPA's route table reconciliation fails with:
failed to discover routes on route table subnet-xxxxx: ipv6 block missing for ipv6 enabled subnet, can't create route for egress only internet gateway
We've traced through the code and understand what is happening, but wanted to understand if there is context behind it that we are not seeing.
How the broken state occurs
- A cluster is created with an IPv4-only managed VPC —
spec.network.vpc.ipv6is nil - IPv6 CIDR blocks are added to the VPC and subnets outside of CAPA directly in AWS.
- During subnet discovery (
subnets.go:406-410), CAPA unconditionally reads the IPv6 CIDR associations from AWS and setsIsIPv6 = trueon the subnet specs:for _, set := range ec2sn.Ipv6CidrBlockAssociationSet { if set.Ipv6CidrBlockState.State == types.SubnetCidrBlockStateCodeAssociated { spec.IPv6CidrBlock = aws.ToString(set.Ipv6CidrBlock) spec.IsIPv6 = true } }
- During VPC discovery (
vpc.go:60-62), the IPv6 info is NOT adopted because of the guard added in #3914:if s.scope.VPC().IsIPv6Enabled() { s.scope.VPC().IPv6 = vpc.IPv6 }
- Route table reconciliation sees
sn.IsIPv6 == trueon private subnets, checks!s.scope.VPC().IsIPv6Enabled(), and returns the error (routetables.go:418-422)
The subnet discovery and VPC discovery have different guarding behavior around IPv6, which creates an inconsistent state that the route table reconciler can't handle.
What we'd like to understand
The webhook validation in awsmanagedcontrolplane_webhook.go:188-191 prevents any change to IsIPv6Enabled():
if oldAWSManagedControlplane.Spec.NetworkSpec.VPC.IsIPv6Enabled() != r.Spec.NetworkSpec.VPC.IsIPv6Enabled() {
// "changing IP family is not allowed after it has been set"
}We understand this was introduced in #3513 and the vpc.go guard was added in #3914 to fix #3912, where CAPA was auto-discovering IPv6 from AWS and then the webhook would reject the update, bricking clusters.
Our questions:
-
Was the immutability intended to be bidirectional? The vpc.go guard from #3914 already prevents the auto-discovery problem that motivated it. Is there a known reason the webhook also needs to block the
nil → non-nildirection (intentionally enabling IPv6), or was this a conservative default that hasn't been revisited? -
Is it safe to work around this by temporarily disabling the webhook and adding
ipv6: {}to the VPC spec? From tracing through the reconciliation, it looks like:reconcileVPC()would adopt the existing IPv6 CIDR from AWSreconcileEgressOnlyInternetGateways()would create an EIGWreconcileRouteTables()would succeed with the::/0→ EIGW route- No attempt would be made to re-associate IPv6 CIDRs (that only happens during VPC creation)
Are there any side effects we're not seeing?
-
Should the subnet discovery also be guarded? The asymmetry between subnet discovery (unconditionally sets
IsIPv6 = true) and VPC discovery (guarded byIsIPv6Enabled()) seems like it could cause issues beyond route tables. Would it make sense to either:- Guard the subnet discovery the same way, or
- Remove the VPC discovery guard and instead allow the reconciler to adopt IPv6 when it's present in AWS?
Environment
- CAPA version: v2.x
- Cluster type: AWSManagedControlPlane (EKS)
- VPC: Managed by CAPA, IPv6 added after initial creation