Skip to content

Use external target for DNSPolicy test#35

Merged
lwr20 merged 8 commits intoprojectcalico:mainfrom
lwr20:lwr-dnsperf-ext
Feb 6, 2026
Merged

Use external target for DNSPolicy test#35
lwr20 merged 8 commits intoprojectcalico:mainfrom
lwr20:lwr-dnsperf-ext

Conversation

@lwr20
Copy link
Member

@lwr20 lwr20 commented Jan 30, 2026

  • Change dnsperf test to use external domain instead of internal fqdns
  • Fix tcpdump pods - they need to run as root to watch traffic
  • Add more validation to configuration switching

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the DNSPerf benchmark to use a configurable external target domain (instead of in-cluster pod/service FQDNs) when generating and exercising DNSPolicy tests.

Changes:

  • Add TargetDomain to DNSPerf config and thread it through DNSPolicy generation and DNSPerf test execution.
  • Improve cluster/DNS policy setup and cleanup by adding Calico NetworkPolicy deletion and additional Felix-mode verification via calico-node logs.
  • Add startup validation to ensure the test node pool has the required node count/labels.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pkg/utils/k8s_utils.go Adds Calico NetworkPolicy deletion helper and a helper to fetch calico-node pod logs.
pkg/dnsperf/dnsperf_test.go Updates unit test to match new MakeDNSPolicy(..., targetDomain) signature and domain behavior.
pkg/dnsperf/dnsperf.go Switches DNSPerf target selection to external TargetDomain, updates curl target formatting, and adjusts goroutine argument capture.
pkg/config/config.go Replaces TargetType with TargetDomain and adds additional dnsperf validation logic.
pkg/cluster/cluster.go Enhances Felix DNS policy mode patching by reading dataplane from Installation and adding post-update verification/log inspection.
cmd/benchmark.go Initializes controller-runtime logger, validates test-node prerequisites, updates DNSPolicy creation call, and expands cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lwr20 lwr20 marked this pull request as ready for review January 30, 2026 22:34

This comment was marked as outdated.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lwr20 lwr20 requested a review from Copilot February 3, 2026 16:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +239 to +241
go func(tcpdumpPod corev1.Pod, testPod corev1.Pod) {
if e := runTCPDump(testctx, clients, &tcpdumpPod, testPod, targetPort, testConfig.Duration+60); e != nil {
log.WithError(e).Error("failed to run tcpdump")
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runTCPDump is started with testctx, which is cancelled after testConfig.Duration. That means the tcpdump exec will be interrupted at the test duration regardless of the timeout argument (Duration+60), so you won’t actually capture the extra tail time. Consider using a separate context with a longer timeout (or the parent ctx) for tcpdump so the timeout parameter is meaningful.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just a failsafe.

Comment on lines 367 to 369
cmdfrag := `curl -m 8 -w '{"time_lookup": %{time_namelookup}, "time_connect": %{time_connect}}\n' -s -o /dev/null`
cmd := fmt.Sprintf("%s %s:8080", cmdfrag, target)
cmd := fmt.Sprintf("%s %s", cmdfrag, target)
stdout, _, err := utils.ExecCommandInPod(ctx, srcPod, cmd, 10)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runDNSPerfTest builds a /bin/sh -c command by string-concatenating the full URL. If TargetURL contains shell metacharacters (e.g. &, ;, #, spaces), curl will receive a mangled URL or unintended extra shell commands may run. Quote/escape the URL argument (or avoid the shell and exec curl with args) to make external URLs robust.

Copilot uses AI. Check for mistakes.
Comment on lines +502 to +506
if token == "IP" && i+3 < len(tokens) {
srcAddr = tokens[i+1]
if tokens[i+2] == ">" {
// Remove trailing colon from destination
dstAddr = strings.TrimSuffix(tokens[i+3], ":")
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processTCPDumpOutput only recognizes the "IP" token when extracting src/dst, so IPv6 tcpdump lines (which typically use the "IP6" token) will be skipped and duplicate SYN/SYN-ACK counts will be wrong on IPv6 clusters. Please handle both "IP" and "IP6" (or parse more defensively).

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't be testing with IPv6 for now.

Comment on lines +530 to +532
scanner := bufio.NewScanner(logs)
for scanner.Scan() {
line := scanner.Text()
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bufio.Scanner has a default token limit (~64K). If a calico-node log line exceeds that, scanning will error and this helper will fail (potentially causing Inline-mode validation to fail spuriously). Consider increasing the scanner buffer size (or using bufio.Reader) before scanning.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine.

Comment on lines +201 to +203
// If we're using Inline, check felix logs to verify mode change - if Inline mode is not possible, Felix will fall back to DelayDeniedPacket mode and log this.
// The log will be: "Failed to load BPF DNS parser program. Maybe the kernel is old. Falling back to DelayDeniedPacket"
if dnsPolicyMode == "Inline" {
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Inline-mode fallback check (wait for TigeraStatus + sleep + scan calico-node logs) runs whenever dnsPolicyMode == "Inline", but the log message being searched for is specifically about loading the BPF DNS parser program. This adds ~70s of delay even on non-BPF dataplanes where that fallback isn’t applicable. Consider gating this log check on the BPF dataplane only.

Suggested change
// If we're using Inline, check felix logs to verify mode change - if Inline mode is not possible, Felix will fall back to DelayDeniedPacket mode and log this.
// The log will be: "Failed to load BPF DNS parser program. Maybe the kernel is old. Falling back to DelayDeniedPacket"
if dnsPolicyMode == "Inline" {
// If we're using Inline on the BPF dataplane, check felix logs to verify mode change - if Inline mode is not possible, Felix will fall back to DelayDeniedPacket mode and log this.
// The log will be: "Failed to load BPF DNS parser program. Maybe the kernel is old. Falling back to DelayDeniedPacket"
if dnsPolicyMode == "Inline" && dataplane != nil && *dataplane == operatorv1.LinuxDataplaneBPF {

Copilot uses AI. Check for mistakes.
Copy link
Member Author

@lwr20 lwr20 Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is just wrong, this applies for Inline mode always.

@lwr20 lwr20 merged commit 81cab28 into projectcalico:main Feb 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants