Skip to content

NLB: DNS record resolves to pod IPs, not the NLB's IPs #6222

@gblues

Description

@gblues

This was reported previously by #1899 but it got auto-closed. Creating a new issue because the problem still exists.

What happened:

When a Service is annotated for both external-dns and the aws-load-balancer-controller, the DNS records are created with the wrong IPs--the FQDN resolves to the pod IPs, not the NLB's IPs.

What you expected to happen:

Ideally, external-dns would publish the right IPs the first time. Personally, I'd be happy with an "eventual consistency" behavior where the pod IPs are used while the NLB is coming online, and then updated to the NLB's IP addresses when the NLB is fully provisioned.

How to reproduce it (as minimally and precisely as possible):

Assumptions:

  • route53 with registered zone. I will be using "example.com" as a placeholder.
  • EKS cluster with external-dns and aws-load-balancer-controller installed.
  • we experienced this issue with internal load balancers; the steps below assume you have some means (e.g. a vpn) of accessing the VPC DNS server. Standing up and connecting to an ec2 jumpbox is left as an exercise for the reader. I have not tested with a public NLB
  1. follow the "Deploy the echoserver resources" instructions only
  2. apply the yaml below (don't forget to replace example.com with your actual domain):
---
apiVersion: v1
kind: Service
metadata:
  name: external-dns-demo
  namespace: echoserver
  annotations:
      external-dns.alpha.kubernetes.io/hostname: demo.example.com
      service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  type: LoadBalancer
  allocateLoadBalancerNodePorts: true
  externalTrafficPolicy: Cluster
  ports:
  - name: http
    port: 80
    targetPort: 8080
  selector:
    app: echoserver
  1. immediately after applying the service yaml, run kubectl get service -n echoserver external-dns-demo and note the NLB hostname in the EXTERNAL-IP column. Attempt to resolve this via nslookup; you should get an NXDOMAIN response.
  2. repeat step 2 every minute or so until you no longer get an NXDOMAIN response
  3. do a nslookup of demo.example.com
    Expected output of successful lookup: the sets of IPs from step 3 and 4 match
    Actual output: demo.example.com resolves to the pod IPs (kubectl get pod -n echoserver -o wide)

Anything else we need to know?:

My experience mirrors @gavinandermerwe 's reply here

After a bit more digging it feels like a race condition for how the route53 cache gets updated internally. The IP addresses become available before the A record for the load balancer.

For me, when the NLB was provisioned by aws-load-balancer-controller, I could see the hostname for the NLB, but it was several minutes before the host resolved.

Like Gavin, removing the external-dns annotation and re-adding it caused the DNS update to work as expected.

Versions involved:

Environment:

  • external-dns: 0.19.0
  • kubernetes: 1.33
  • DNS provider: route53
  • aws-load-balancer-controller: 3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions