[client] Skip DNS upstream failover on definitive EDE#6089
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds Extended DNS Errors (EDE) handling to upstream DNS queries: it ensures EDNS0 is advertised on outgoing requests when missing, adds helpers to identify non-retryable EDE codes, and short-circuits upstream failover by writing a successful response and setting metadata when such EDEs are observed. ChangesEDE Handling in Upstream DNS
Sequence Diagram(s)sequenceDiagram
participant Resolver as Client Resolver
participant Handler as Upstream Handler
participant Upstream as Upstream DNS
participant Metadata as Metadata Store
participant Formatter as Capture Formatter
Resolver->>Handler: Send DNS query
Handler->>Upstream: Forward query (ensure EDNS0 present)
Upstream-->>Handler: Reply (Rcode=SERVFAIL/REFUSED + EDNS0 EDE)
alt non-retryable EDE found
Handler->>Metadata: set "ede" = edeName(code)
Handler->>Resolver: write response (short-circuit)
Handler->>Formatter: include EDE in formatted output
else no non-retryable EDE
Handler->>Handler: proceed with standard failover to next upstream
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Release artifactsBuilt for PR head
GHCR images (amd64)
This comment is updated by the Release workflow. Artifact links expire according to the workflow retention policy. |
1417458 to
e394818
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
client/internal/dns/upstream.go (2)
345-388: ⚡ Quick winHelpers look correct; consider adding unit tests.
Coverage of EDE codes 1, 2, 5–12, and 15–18 matches the PR intent, and falling back to
EDE <n>for unknown codes is a reasonable observability default. Given no tests were added per the checklist and this changes failover semantics, table-driven tests fornonRetryableEDE(mixed OPT options, multiple EDE entries, no EDNS0, retryable codes) and aqueryUpstreamtest that asserts no failover happens on SERVFAIL+EDE would lock in the new behavior cheaply.A static analyzer also flagged the
caseblock length — the current form reads fine to me; only worth splitting if you find the readability annoying.Want me to draft a
nonRetryableEDEtest table and a fakeupstreamClienttest that exercises the new failover-skip path?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/dns/upstream.go` around lines 345 - 388, Add unit tests to lock in the new EDE-based failover semantics: create table-driven tests for nonRetryableEDE covering (a) messages with no EDNS0, (b) OPT with mixed options, (c) single and multiple EDNS0_EDE entries including codes covered by the case block (1,2,5–12,15–18) and known retryable codes, and (d) unknown codes to assert edeName falls back to "EDE <n>". Also add a test for queryUpstream (using a fake upstreamClient) that returns SERVFAIL with an EDNS0_EDE non-retryable code and assert the client does not attempt failover to another upstream. Ensure tests reference nonRetryableEDE, edeName, queryUpstream and the upstreamClient mock so future changes to those symbols fail the tests.
246-252: ⚡ Quick winEDNS0 added to inbound request may leak OPT to non-EDNS0 downstream clients.
At lines 249–251,
r.SetEdns0(...)mutates the original message received from the downstream client. If the client didn't advertise EDNS0, we add OPT torbefore forwarding upstream. The upstream responsermwill then include an OPT RR, andwriteSuccessResponsewritesrmback to the client unchanged viaw.WriteMsg(rm)at line 313. Per RFC 6891 §6.1.1, a responder MUST NOT include an OPT RR in a response when the request had none.Two viable fixes:
- Track whether the inbound request originally had EDNS0 and strip OPT from
rm(and resetExtra) before responding when it didn't.- Operate on a shallow copy of
r(with our ownExtra) for the upstream exchange so the inbound message isn't mutated.Also worth hardening:
currentMTU - ipUDPHeaderSizeperforms uint16 arithmetic and would silently underflow if MTU < 68 bytes, producing a huge UDPSize. This is a pre-existing pattern elsewhere in the file (lines 584, 656, 700), but adding a guard would harden it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/dns/upstream.go` around lines 246 - 252, The request message r is being mutated by calling r.SetEdns0(...), which can cause an OPT RR to be sent back to a client that didn't advertise EDNS0; instead either operate on a shallow copy for the upstream exchange or track whether the inbound request originally had EDNS0 and strip OPT from the upstream response before replying. Concretely: record hadEdns := (r.IsEdns0()!=nil), then create a copy for upstream (e.g., reqUp := r.Copy() and ensure reqUp.Extra is a new slice) and call reqUp.SetEdns0(...) so r is unchanged; after receiving rm from upstream, if !hadEdns remove any OPT RR(s) from rm.Extra (or reset rm.Extra to only non-OPT RRs) before calling writeSuccessResponse / w.WriteMsg(rm). Also harden the MTU math used with SetEdns0 (currentMTU - ipUDPHeaderSize) by guarding against underflow: compute the UDP payload size only when currentMTU > ipUDPHeaderSize (otherwise use a safe default) and pass that safe uint16 to SetEdns0.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@client/internal/dns/upstream.go`:
- Around line 345-388: Add unit tests to lock in the new EDE-based failover
semantics: create table-driven tests for nonRetryableEDE covering (a) messages
with no EDNS0, (b) OPT with mixed options, (c) single and multiple EDNS0_EDE
entries including codes covered by the case block (1,2,5–12,15–18) and known
retryable codes, and (d) unknown codes to assert edeName falls back to "EDE
<n>". Also add a test for queryUpstream (using a fake upstreamClient) that
returns SERVFAIL with an EDNS0_EDE non-retryable code and assert the client does
not attempt failover to another upstream. Ensure tests reference
nonRetryableEDE, edeName, queryUpstream and the upstreamClient mock so future
changes to those symbols fail the tests.
- Around line 246-252: The request message r is being mutated by calling
r.SetEdns0(...), which can cause an OPT RR to be sent back to a client that
didn't advertise EDNS0; instead either operate on a shallow copy for the
upstream exchange or track whether the inbound request originally had EDNS0 and
strip OPT from the upstream response before replying. Concretely: record hadEdns
:= (r.IsEdns0()!=nil), then create a copy for upstream (e.g., reqUp := r.Copy()
and ensure reqUp.Extra is a new slice) and call reqUp.SetEdns0(...) so r is
unchanged; after receiving rm from upstream, if !hadEdns remove any OPT RR(s)
from rm.Extra (or reset rm.Extra to only non-OPT RRs) before calling
writeSuccessResponse / w.WriteMsg(rm). Also harden the MTU math used with
SetEdns0 (currentMTU - ipUDPHeaderSize) by guarding against underflow: compute
the UDP payload size only when currentMTU > ipUDPHeaderSize (otherwise use a
safe default) and pass that safe uint16 to SetEdns0.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6a9b002f-bfa2-4767-957d-57af07b98677
📒 Files selected for processing (1)
client/internal/dns/upstream.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
client/internal/dns/upstream.go (1)
292-299: 💤 Low valueOptional: log the EDE short-circuit for observability.
When this branch fires, failover is intentionally suppressed. A single trace/debug line would make production troubleshooting (e.g., "why did we stop after upstream A on a SERVFAIL?") much easier without adding noise on the success path. The metadata is set, but a log makes it greppable in practice.
♻️ Suggested addition
if rm.Rcode == dns.RcodeServerFailure || rm.Rcode == dns.RcodeRefused { if code, ok := nonRetryableEDE(rm); ok { + logger.Debugf("upstream %s returned %s with non-retryable EDE %s; skipping failover", + upstream, dns.RcodeToString[rm.Rcode], edeName(code)) resutil.SetMeta(w, "ede", edeName(code)) u.writeSuccessResponse(w, rm, upstream, r.Question[0].Name, t, upstreamProto, logger) return nil } return &upstreamFailure{upstream: upstream, reason: dns.RcodeToString[rm.Rcode]} }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/dns/upstream.go` around lines 292 - 299, Add a debug/trace log when the non-retryable EDE short-circuit is taken: inside the branch that checks rm.Rcode == dns.RcodeServerFailure || rm.Rcode == dns.RcodeRefused and after nonRetryableEDE(rm) returns ok, call the existing logger to emit a concise message (include upstream identity, question name r.Question[0].Name, the EDE code or edeName(code) and that failover was suppressed) just before resutil.SetMeta(...) / u.writeSuccessResponse(...). Use the same logger variable already in scope so the log is greppable and low-noise (debug/trace level).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@client/internal/dns/upstream.go`:
- Around line 292-299: Add a debug/trace log when the non-retryable EDE
short-circuit is taken: inside the branch that checks rm.Rcode ==
dns.RcodeServerFailure || rm.Rcode == dns.RcodeRefused and after
nonRetryableEDE(rm) returns ok, call the existing logger to emit a concise
message (include upstream identity, question name r.Question[0].Name, the EDE
code or edeName(code) and that failover was suppressed) just before
resutil.SetMeta(...) / u.writeSuccessResponse(...). Use the same logger variable
already in scope so the log is greppable and low-noise (debug/trace level).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 042a787a-ec41-4bf8-85b4-8d23bff177c2
📒 Files selected for processing (2)
.github/workflows/golangci-lint.ymlclient/internal/dns/upstream.go
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
util/capture/text.go (1)
597-609:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInclude EDE in
NOERRORresponses too.EDE is also used as informational metadata on successful responses, so formatting it only in the non-
NoErrbranch drops part of the observability this PR is trying to add.Suggested change
func formatDNSResponse(d *layers.DNS, rd string, plen int) string { anCount := d.ANCount nsCount := d.NSCount arCount := d.ARCount + ede := formatEDE(d) if d.ResponseCode != layers.DNSResponseCodeNoErr { - ede := formatEDE(d) return fmt.Sprintf("%04x %d/%d/%d %s%s (%d)", d.ID, anCount, nsCount, arCount, d.ResponseCode, ede, plen) } if anCount > 0 && len(d.Answers) > 0 { rr := d.Answers[0] if rdata := shortRData(&rr); rdata != "" { - return fmt.Sprintf("%04x %d/%d/%d %s %s (%d)", d.ID, anCount, nsCount, arCount, rr.Type, rdata, plen) + return fmt.Sprintf("%04x %d/%d/%d %s %s%s (%d)", d.ID, anCount, nsCount, arCount, rr.Type, rdata, ede, plen) } } - return fmt.Sprintf("%04x %d/%d/%d (%d)", d.ID, anCount, nsCount, arCount, plen) + return fmt.Sprintf("%04x %d/%d/%d%s (%d)", d.ID, anCount, nsCount, arCount, ede, plen) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@util/capture/text.go` around lines 597 - 609, The current logic only computes and includes EDE via formatEDE(d) when d.ResponseCode != layers.DNSResponseCodeNoErr, which omits EDE for successful responses; update the code to always compute ede := formatEDE(d) and include ede in the formatted strings returned by the successful-response branches (the branch using shortRData(&rr) and the final default return) so that formatEDE(d) is referenced for both NOERROR and error responses; locate references to d.ResponseCode, layers.DNSResponseCodeNoErr, formatEDE, shortRData, and d.Answers to modify the two successful-return fmt.Sprintf calls to include the ede string.
🧹 Nitpick comments (1)
client/internal/dns/upstream_test.go (1)
790-850: ⚡ Quick winAdd one end-to-end test for the resolver short-circuit path.
These tests validate the helpers, but they do not exercise
ServeDNS/queryUpstream. A regression could still query the second upstream, skip theedemetadata, or leak the synthetic OPT back to a non-EDNS client. Please add a failover test where upstream 1 returnsSERVFAILorREFUSEDwith a definitive EDE and upstream 2 would otherwise succeed, then assert only upstream 1 was queried and the written response has no OPT when the client did not advertise EDNS0.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/dns/upstream_test.go` around lines 790 - 850, Add an end-to-end test that exercises ServeDNS/queryUpstream: create a resolver with two upstream handlers where upstream1 returns SERVFAIL or REFUSED with an OPT carrying a definitive EDE (e.g., dns.ExtendedErrorCodeDNSBogus) and upstream2 would normally succeed; send a client query that does NOT include EDNS0 and assert only upstream1 was contacted (use a call counter) and the response written to the client contains no OPT and no EDE metadata (i.e., the synthetic OPT from upstream1 must not be leaked to the client). Ensure the test uses the existing helpers (nonRetryableEDE/stripOPT) expectations and names the test clearly (e.g., TestServeDNS_NonRetryableEDE_ShortCircuit).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@util/capture/text.go`:
- Line 618: The identifier Additionals in util/capture/text.go is a deliberate
gopacket field name and should be exempted from codespell; update the codespell
ignore configuration used by CI to include the token "Additionals" (for example
by adding it to your codespell ignore-words file or the codespell CLI/Action
ignore list) or scope the codespell rule so it skips this symbol, then re-run CI
so the false positive is suppressed.
---
Outside diff comments:
In `@util/capture/text.go`:
- Around line 597-609: The current logic only computes and includes EDE via
formatEDE(d) when d.ResponseCode != layers.DNSResponseCodeNoErr, which omits EDE
for successful responses; update the code to always compute ede := formatEDE(d)
and include ede in the formatted strings returned by the successful-response
branches (the branch using shortRData(&rr) and the final default return) so that
formatEDE(d) is referenced for both NOERROR and error responses; locate
references to d.ResponseCode, layers.DNSResponseCodeNoErr, formatEDE,
shortRData, and d.Answers to modify the two successful-return fmt.Sprintf calls
to include the ede string.
---
Nitpick comments:
In `@client/internal/dns/upstream_test.go`:
- Around line 790-850: Add an end-to-end test that exercises
ServeDNS/queryUpstream: create a resolver with two upstream handlers where
upstream1 returns SERVFAIL or REFUSED with an OPT carrying a definitive EDE
(e.g., dns.ExtendedErrorCodeDNSBogus) and upstream2 would normally succeed; send
a client query that does NOT include EDNS0 and assert only upstream1 was
contacted (use a call counter) and the response written to the client contains
no OPT and no EDE metadata (i.e., the synthetic OPT from upstream1 must not be
leaked to the client). Ensure the test uses the existing helpers
(nonRetryableEDE/stripOPT) expectations and names the test clearly (e.g.,
TestServeDNS_NonRetryableEDE_ShortCircuit).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6b7851c7-7e46-4c96-920a-8635c5f0e815
📒 Files selected for processing (3)
client/internal/dns/upstream.goclient/internal/dns/upstream_test.goutil/capture/text.go
# Conflicts: # client/internal/dns/upstream.go
|



Describe your changes
When an upstream resolver returns SERVFAIL/REFUSED with an Extended DNS Error (RFC 8914) indicating a definitive answer (DNSSEC validation failure, blocked, censored, filtered, prohibited), propagate the response immediately instead of trying every other configured upstream. Trying the next resolver wastes ~one RTT per upstream for queries where the answer cannot change.
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Internal resolver behavior, not user-visible config.
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit