Skip to content

Support per-node capabilities and device access #39

@dennisklein

Description

@dennisklein

Add optional security fields to the per-node config allowing users to grant specific Linux capabilities and expose host devices to individual containers. The immediate motivation is testing CVMFS config management (package installation, autofs/automount setup, FUSE mount lifecycle) inside sind containers, but the feature is general-purpose.

This is strictly opt-in. sind's default security posture remains unchanged — no extra capabilities, no device access, no privileged containers. These fields exist only for users who explicitly need them for specific use cases like testing config management that requires mount privileges.

User perspective

kind: Cluster
name: test

nodes:
  - role: controller
  - role: submitter
  - role: worker
    count: 3
    capAdd:
      - SYS_ADMIN
    devices:
      - /dev/fuse
$ sind create cluster -c cluster.yaml
$ sind enter test
[root@worker-0 ~]# yum install -y cvmfs cvmfs-config-default
[root@worker-0 ~]# echo 'CVMFS_HTTP_PROXY=DIRECT' > /etc/cvmfs/default.local
[root@worker-0 ~]# cvmfs_config setup
[root@worker-0 ~]# systemctl start autofs
[root@worker-0 ~]# ls /cvmfs/software.cern.ch/
bin  etc  lib  share

This tests the full CVMFS provisioning stack end-to-end — the same Puppet/Ansible/Salt logic that runs on bare-metal nodes.

Relationship to #38

#38 covers consuming CVMFS — the container gets a working /cvmfs via host bind-mount or Docker volume plugin without needing any extra privileges. That approach bypasses the entire CVMFS client stack inside the container.

This issue covers testing the provisioning of CVMFS — installing packages, writing config files, setting up autofs or systemd automount units, and running the cvmfs2 FUSE client. This requires the container to actually perform mounts, which needs CAP_SYS_ADMIN and /dev/fuse.

Design

Opt-in by design

sind deliberately avoids --privileged and extra capabilities by default — running containers as rootless as possible is a core design principle (see Docker 28.0+ writable cgroups eliminating the last need for --privileged). These new fields do not change any defaults. A cluster config without capAdd/devices behaves exactly as before. When a user does opt in, the scope is targeted (specific capabilities, specific devices) rather than blanket --privileged.

sind should log a notice at cluster creation when extra capabilities or devices are configured, making the privilege escalation visible.

Naming convention

Field names, semantics, and value formats follow Docker Compose (cap_add, cap_drop, devices, security_opt), but use camelCase to stay consistent with sind's existing config convention (tmpSize, dataStorage, mountPath). The mapping:

Docker Compose sind config Type
cap_add capAdd list of strings
cap_drop capDrop list of strings
devices devices list of strings
security_opt securityOpt list of strings
privileged deliberately not exposed

Capability names and device string formats are identical to Compose (e.g. SYS_ADMIN, /dev/fuse, /dev/sda:/dev/xvda:rwm).

Config schema (pkg/config/config.go)

Add the fields directly to Node (flat, like Compose services) rather than nesting under a security object:

type Node struct {
    Role        Role     `json:"role"`
    Count       int      `json:"count,omitempty"`
    Image       string   `json:"image,omitempty"`
    CPUs        int      `json:"cpus,omitempty"`
    Memory      string   `json:"memory,omitempty"`
    TmpSize     string   `json:"tmpSize,omitempty"`
    Managed     *bool    `json:"managed,omitempty"`
    CapAdd      []string `json:"capAdd,omitempty"`
    CapDrop     []string `json:"capDrop,omitempty"`
    Devices     []string `json:"devices,omitempty"`
    SecurityOpt []string `json:"securityOpt,omitempty"`
}

These fields should also be supported in Defaults so they can be applied cluster-wide:

defaults:
  capAdd:
    - SYS_ADMIN
  devices:
    - /dev/fuse

Per-node values should merge with (not replace) defaults.

RunConfig changes

Add corresponding fields to RunConfig and populate them during config resolution.

BuildRunArgs changes (pkg/cluster/node.go)

In BuildRunArgs(), append flags when the fields are non-empty:

for _, cap := range cfg.CapAdd {
    args = append(args, "--cap-add", cap)
}
for _, cap := range cfg.CapDrop {
    args = append(args, "--cap-drop", cap)
}
for _, dev := range cfg.Devices {
    args = append(args, "--device", dev)
}
for _, opt := range cfg.SecurityOpt {
    args = append(args, "--security-opt", opt)
}

Validation (pkg/config/)

  • capAdd/capDrop values should be validated against known Linux capability names. Reject unknown names early.
  • devices values should start with /. Optionally verify the device exists on the host at cluster creation time.
  • No allowlist/blocklist beyond validation — the user decides what they need.

sind-node image consideration

The default sind-node Dockerfile masks sys-fs-fuse-connections.mount:

RUN systemctl mask \
    dev-hugepages.mount \
    sys-fs-fuse-connections.mount \
    ...

This prevents FUSE from working even when SYS_ADMIN and /dev/fuse are granted. The unit is a no-op without /dev/fuse anyway, so removing it from the mask list is harmless. If the concern is noisy logs when /dev/fuse is absent, a systemd condition (ConditionPathExists=/dev/fuse) is cleaner than masking.

Scope

In scope

  • capAdd, capDrop, devices, securityOpt fields on Node and Defaults
  • Docker flag generation in BuildRunArgs()
  • Config validation
  • Defaults merging
  • Logging when extra privileges are configured
  • Documentation

Out of scope

  • privileged: true — deliberately not exposed; capAdd + devices covers specific needs without granting blanket privileges

References

Metadata

Metadata

Assignees

Labels

featureNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions