Use external target for DNSPolicy test#35
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the DNSPerf benchmark to use a configurable external target domain (instead of in-cluster pod/service FQDNs) when generating and exercising DNSPolicy tests.
Changes:
- Add
TargetDomainto DNSPerf config and thread it through DNSPolicy generation and DNSPerf test execution. - Improve cluster/DNS policy setup and cleanup by adding Calico NetworkPolicy deletion and additional Felix-mode verification via calico-node logs.
- Add startup validation to ensure the test node pool has the required node count/labels.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/utils/k8s_utils.go | Adds Calico NetworkPolicy deletion helper and a helper to fetch calico-node pod logs. |
| pkg/dnsperf/dnsperf_test.go | Updates unit test to match new MakeDNSPolicy(..., targetDomain) signature and domain behavior. |
| pkg/dnsperf/dnsperf.go | Switches DNSPerf target selection to external TargetDomain, updates curl target formatting, and adjusts goroutine argument capture. |
| pkg/config/config.go | Replaces TargetType with TargetDomain and adds additional dnsperf validation logic. |
| pkg/cluster/cluster.go | Enhances Felix DNS policy mode patching by reading dataplane from Installation and adding post-update verification/log inspection. |
| cmd/benchmark.go | Initializes controller-runtime logger, validates test-node prerequisites, updates DNSPolicy creation call, and expands cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3374e8a to
95a2b43
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
95a2b43 to
53c12fe
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7b9750a to
24e9527
Compare
24e9527 to
ab6a07a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
36f3b00 to
0cd5521
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| go func(tcpdumpPod corev1.Pod, testPod corev1.Pod) { | ||
| if e := runTCPDump(testctx, clients, &tcpdumpPod, testPod, targetPort, testConfig.Duration+60); e != nil { | ||
| log.WithError(e).Error("failed to run tcpdump") |
There was a problem hiding this comment.
runTCPDump is started with testctx, which is cancelled after testConfig.Duration. That means the tcpdump exec will be interrupted at the test duration regardless of the timeout argument (Duration+60), so you won’t actually capture the extra tail time. Consider using a separate context with a longer timeout (or the parent ctx) for tcpdump so the timeout parameter is meaningful.
| cmdfrag := `curl -m 8 -w '{"time_lookup": %{time_namelookup}, "time_connect": %{time_connect}}\n' -s -o /dev/null` | ||
| cmd := fmt.Sprintf("%s %s:8080", cmdfrag, target) | ||
| cmd := fmt.Sprintf("%s %s", cmdfrag, target) | ||
| stdout, _, err := utils.ExecCommandInPod(ctx, srcPod, cmd, 10) |
There was a problem hiding this comment.
runDNSPerfTest builds a /bin/sh -c command by string-concatenating the full URL. If TargetURL contains shell metacharacters (e.g. &, ;, #, spaces), curl will receive a mangled URL or unintended extra shell commands may run. Quote/escape the URL argument (or avoid the shell and exec curl with args) to make external URLs robust.
| if token == "IP" && i+3 < len(tokens) { | ||
| srcAddr = tokens[i+1] | ||
| if tokens[i+2] == ">" { | ||
| // Remove trailing colon from destination | ||
| dstAddr = strings.TrimSuffix(tokens[i+3], ":") |
There was a problem hiding this comment.
processTCPDumpOutput only recognizes the "IP" token when extracting src/dst, so IPv6 tcpdump lines (which typically use the "IP6" token) will be skipped and duplicate SYN/SYN-ACK counts will be wrong on IPv6 clusters. Please handle both "IP" and "IP6" (or parse more defensively).
There was a problem hiding this comment.
We won't be testing with IPv6 for now.
| scanner := bufio.NewScanner(logs) | ||
| for scanner.Scan() { | ||
| line := scanner.Text() |
There was a problem hiding this comment.
bufio.Scanner has a default token limit (~64K). If a calico-node log line exceeds that, scanning will error and this helper will fail (potentially causing Inline-mode validation to fail spuriously). Consider increasing the scanner buffer size (or using bufio.Reader) before scanning.
| // If we're using Inline, check felix logs to verify mode change - if Inline mode is not possible, Felix will fall back to DelayDeniedPacket mode and log this. | ||
| // The log will be: "Failed to load BPF DNS parser program. Maybe the kernel is old. Falling back to DelayDeniedPacket" | ||
| if dnsPolicyMode == "Inline" { |
There was a problem hiding this comment.
The Inline-mode fallback check (wait for TigeraStatus + sleep + scan calico-node logs) runs whenever dnsPolicyMode == "Inline", but the log message being searched for is specifically about loading the BPF DNS parser program. This adds ~70s of delay even on non-BPF dataplanes where that fallback isn’t applicable. Consider gating this log check on the BPF dataplane only.
| // If we're using Inline, check felix logs to verify mode change - if Inline mode is not possible, Felix will fall back to DelayDeniedPacket mode and log this. | |
| // The log will be: "Failed to load BPF DNS parser program. Maybe the kernel is old. Falling back to DelayDeniedPacket" | |
| if dnsPolicyMode == "Inline" { | |
| // If we're using Inline on the BPF dataplane, check felix logs to verify mode change - if Inline mode is not possible, Felix will fall back to DelayDeniedPacket mode and log this. | |
| // The log will be: "Failed to load BPF DNS parser program. Maybe the kernel is old. Falling back to DelayDeniedPacket" | |
| if dnsPolicyMode == "Inline" && dataplane != nil && *dataplane == operatorv1.LinuxDataplaneBPF { |
There was a problem hiding this comment.
The change is just wrong, this applies for Inline mode always.
58948ff to
1e034aa
Compare
Uh oh!
There was an error while loading. Please reload this page.