Skip to content

fix: support --inject-cgroup and sidecar stress modes on containerd#304

Merged
alexei-led merged 10 commits intomasterfrom
fix/containerd-inject-cgroup
Mar 1, 2026
Merged

fix: support --inject-cgroup and sidecar stress modes on containerd#304
alexei-led merged 10 commits intomasterfrom
fix/containerd-inject-cgroup

Conversation

@alexei-led
Copy link
Owner

Summary

  • Implement sidecar-based stress testing for the containerd runtime, matching the Docker runtime's capabilities
  • The --inject-cgroup, --stress-image, and --pull-image flags were silently ignored on containerd — now they work correctly
  • Three stress modes supported: direct exec, default sidecar (child cgroup), and inject-cgroup sidecar

Resolves #303

Changes

pkg/runtime/containerd/sidecar.go

  • Add resolveCgroupPath() — reads /proc/<pid>/cgroup to resolve target's cgroup path (v2 + v1 fallback)
  • Add buildStressSpecOpts() — builds OCI spec for default vs inject-cgroup sidecar modes
  • Add stressSidecar() — creates a long-lived sidecar running stress-ng/cg-inject as its main process
  • Add createStressSidecar() / startSidecarTask() / waitStressSidecar() helpers (extracted for cyclomatic complexity)

pkg/runtime/containerd/client.go

  • Rewrite StressContainer() with three-mode dispatch:
    • image == "" → direct exec (existing behavior)
    • image != "" && !injectCgroup → sidecar with /stress-ng, CgroupParent = target's parent
    • image != "" && injectCgroup → sidecar with /cg-inject, host cgroupns, /sys/fs/cgroup mount

pkg/runtime/containerd/client_test.go

  • 17 new unit tests covering resolveCgroupPath, buildStressSpecOpts, and all sidecar stress code paths

tests/containerd_stress.bats

  • Add integration tests for sidecar mode (dry-run, child-cgroup, inject-cgroup)

Test plan

  • make fmt — no changes
  • make lint — 0 issues
  • CGO_ENABLED=0 go test ./... — all tests pass
  • colima ssh -- sudo bats tests/containerd_stress.bats — containerd integration tests

…untime

The containerd StressContainer() method was silently ignoring --inject-cgroup,
--stress-image, and --pull-image flags, falling back to direct exec mode.

Implement three stress modes matching the Docker runtime:
- Direct exec (no --stress-image): existing behaviour, runs stress-ng inside target
- Default sidecar (--stress-image): /stress-ng in a child cgroup of the target
- Inject-cgroup (--inject-cgroup --stress-image): /cg-inject with host cgroupns
  and /sys/fs/cgroup mount to inject into target's cgroup

Resolves #303
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Test Results

546 tests   544 ✅  2s ⏱️
 12 suites    2 💤
  1 files      0 ❌

Results for commit f79f571.

♻️ This comment has been updated with latest results.

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 50.79365% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.28%. Comparing base (e4ebe5a) to head (9134310).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
pkg/runtime/containerd/sidecar.go 49.18% 62 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #304      +/-   ##
==========================================
- Coverage   70.15%   69.28%   -0.87%     
==========================================
  Files          46       46              
  Lines        1933     2051     +118     
==========================================
+ Hits         1356     1421      +65     
- Misses        577      630      +53     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Use strings.SplitSeq (modernize linter)
- Use strings.HasPrefix in stressPrefix matcher
- Wrap resolveCgroupPath error with context
- Convert resolveCgroupPath tests to table-driven
- Combine buildStressSpecOpts tests into table-driven
- Extract newSidecarWithExitCode helper, deduplicate NonZeroExit test
- Standardize require.NoError in SidecarDryrun test
- Replace sleep 5 with wait_for polling in bats setup
- Add snapshot cleanup in bats teardown
- Use default-and-override for cgroupParent assignment
- Add com.gaiaadm.pumba.skip label to stress sidecar containers
  to prevent pumba from targeting its own sidecars (parity with Docker)
- Use two-value receive on waitCh to detect unexpected channel close
- Fix off-by-one capacity in inject-cgroup args builder (+4 -> +5)
- Remove double-wrapped pullImage error message
- Rename cleanupContainer to deleteContainer (clarify vs cleanupSidecar)
- Add startSidecarTask error path tests (NewTask, Wait, Start failures)
- Fold ReadError into table-driven TestResolveCgroupPath
- Convert stressPrefix from package-level var to function
- Add log assertion to child-cgroup bats integration test
- Update mock expectations for WithContainerLabels option
- Add select with ctx.Done() in waitStressSidecar to prevent goroutine
  leak when containerd becomes unresponsive; force-kills task on cancel
- Swap pull/cgroup-resolution order for fail-fast parity with Docker
- Truncate raw cgroup data in error message for log cleanliness
- Replace READER_ERROR sentinel with readerErr struct field in table test
- Remove duplicate SidecarDryrun test (already covered by Dryrun test)
- Consolidate SidecarSuccess, InjectCgroupSuccess, SidecarNonZeroExit
  into single table-driven TestStressContainer_SidecarExitCodes
- Extract setupStressTarget helper to reduce preamble duplication
- Add setCgroupReader to PullImageError test for ordering robustness
- Change interface{} to any in stressPrefix() return type
- Add kill timeout on post-kill waitCh receive to prevent goroutine hang
- Add nil-check after GetImage to prevent panic on nil image
- Reuse deleteContainer in waitStressSidecar to eliminate duplicated cleanup
- Add test for context-cancellation path in waitStressSidecar
- Add test for closed wait-channel (ok=false) path
- Expand TestBuildStressSpecOpts to verify spec content, not just length
- Remove redundant wantSuccess field from SidecarExitCodes table
- Add AssertExpectations to hand-rolled mocks in all new tests
- Document which parameter each branch uses in buildStressSpecOpts
- Simplify deleteContainer to trust caller's context instead of double-wrapping
- Remove dead nil-image guard after GetImage
- Change rbind to bind for /sys/fs/cgroup mount (Docker parity)
- Wrap pullImage error for better diagnostics
- Replace fragile injectFixedArgs constant with prefix slice
- Log non-NotFound kill errors instead of silently discarding
- Thread *testing.T through newSidecarWithExitCode for AssertExpectations
- Add TestStressContainer_SidecarCgroupResolveError test
- Remove test comments per project convention
- Add killTimedOut sentinel so kill-timeout path reports accurate error
  instead of misleading "wait channel closed unexpectedly"
- Replace defer cancel() with direct cancel() in createStressSidecar
  error path for clarity and immediate timer release
- Move defer cancel() after cleanup calls in waitStressSidecar to
  release context timer immediately after use
- Flatten cleanupSidecar: guard clause for task error, collapse
  double-if on container delete to single && condition
- Rewrite createStressSidecar godoc to be specific
- Remove obvious paraphrasing comments from bats integration tests
- Use context.WithoutCancel in startSidecarTask error paths so task
  cleanup Delete calls succeed even if caller context is canceled
- Add nosuid,nodev,noexec to /sys/fs/cgroup bind mount options to
  reduce attack surface on the inject-cgroup sidecar
- Flatten cleanupSidecar: replace else-if with separate if statements
- Trim deleteContainer godoc to remove name-restating first sentence
- Remove obvious bats comment before sidecar dry-run assertion
containerd does not auto-create child cgroups like Docker's --cgroup-parent.
Setting CgroupsPath to the parent (e.g. /system.slice) fails with "container's
cgroup is not empty". Add cgroupChildPath() that mirrors containerd CRI's
heuristic: systemd drivers use "slice:pumba:id" format, cgroupfs uses
"parent/id".
Images are pre-pulled in setup() via ctr_pull_image, but pumba still
contacted the registry at runtime. When ghcr.io is slow, this causes
context canceled errors (flaky CI failures on tests 53-55). Match the
pattern used by all Docker integration tests.
@alexei-led alexei-led merged commit 8743ab9 into master Mar 1, 2026
6 of 8 checks passed
@alexei-led alexei-led deleted the fix/containerd-inject-cgroup branch March 1, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stress --inject-cgroup flag silently ignored on containerd runtime

1 participant