Skip to content

feat: cross-platform CI matrix, sandbox hardening, and operator tooling#535

Open
somethingwithproof wants to merge 172 commits intoCacti:developfrom
somethingwithproof:feat/distro-test-matrix
Open

feat: cross-platform CI matrix, sandbox hardening, and operator tooling#535
somethingwithproof wants to merge 172 commits intoCacti:developfrom
somethingwithproof:feat/distro-test-matrix

Conversation

@somethingwithproof
Copy link
Copy Markdown
Contributor

Adds a CI lane that builds spine inside 10 Linux distro containers plus macOS, Windows (advisory), and FreeBSD 14. Also ships scripts/test-distros.sh so developers can reproduce the matrix locally and a docs/platforms.md listing supported platforms with per-distro build-dep install commands.

Matrix

Linux: rockylinux:9, rockylinux:8, almalinux:9, fedora:latest, debian:12, debian:trixie, ubuntu:22.04, ubuntu:24.04, opensuse/leap:15, alpine:3.20.

macOS: macos-latest with Homebrew net-snmp, mariadb-connector-c, openssl.

Windows: windows-latest with Visual Studio 2022 (advisory, continue-on-error until the port stabilizes).

FreeBSD: 14.1 via cross-platform-actions/action.

Local reproduction

bash scripts/test-distros.sh               # all
bash scripts/test-distros.sh rockylinux:9  # one

Build logs land under build-reports/.

Fixes

openSUSE Leap 15 default gcc 7.5 rejects -std=c17; workflow and script install gcc13 and export CC=gcc-13.

The Rocky 9 pipe2 fix landed on #523 separately since it addresses a Rocky-Linux compile failure the maintainer flagged.

- Add CMakeLists.txt mirroring configure.ac feature checks
- Add config/config.h.cmake.in template for cmake builds
- Add build-cmake-linux CI job (gcc/clang matrix)
- Add build-windows CI job (MSYS2/MinGW-w64, continue-on-error)
- Windows crash dump collection via WER LocalDumps
- Gate -Wall by compiler ID (GNU/Clang vs MSVC)
- Use CMAKE_DL_LIBS instead of hardcoded -ldl
- Parse net-snmp-config --libs into proper NETSNMP_LIBRARIES
- Add SNMP_LOCALNAME compile check for feature parity
- No C source files modified

Signed-off-by: Thomas Vincent <[email protected]>
spine.h depends on types from common.h (MYSQL, pid_t, size_t,
pthread types, RESULTS_BUFFER from config.h). Add a compile-time
guard that produces a clear error if spine.h is included without
common.h, rather than cascading type errors.

Signed-off-by: Thomas Vincent <[email protected]>
strncasecmp() returns 0 on match, but the comparisons used the
raw return value as truthy, inverting the logic. TCP matched
non-TCP strings and vice versa.

Also fix: comparisons against 'hostname' instead of 'token'
(lines 1020, 1032), and strncasecmp length 3 for 4-char strings
"TCP6"/"UDP6" (should be 4).

Signed-off-by: Thomas Vincent <[email protected]>
getarg(opt, &argv) advances the argv pointer on each call. Three
successive calls in the --mode handler consumed three argv entries
instead of one, corrupting subsequent argument parsing when using
the space-separated form (--mode online).

Signed-off-by: Thomas Vincent <[email protected]>
somethingwithproof and others added 18 commits April 15, 2026 00:54
…ivdrop

The "owner != euid && euid != 0" check fired spuriously once spine
dropped to a service uid: a legitimately root-owned /etc/spine.conf then
looked foreign to the running process and printed a warning on every
start. Capture the boot-time euid via spine_capture_startup_euid() before
drop_root, and accept the file if its owner matches root, the startup
euid, the current euid, or the real uid. All four are expected
deployment shapes; nothing else is.

Signed-off-by: Thomas Vincent <[email protected]>
Per-TU #define _GNU_SOURCE guards only survive as long as every
translation unit remembers the pattern, and they bypass test binaries
that pull spine_platform's object files directly. Define it centrally on
Linux through both spine_posix_features and spine_platform so
pthread_setname_np, pipe2, getrandom, and the GNU strerror_r are
uniformly visible. Remove the now-redundant per-file #defines in
platform_process_posix.c and platform_posix.c.

Signed-off-by: Thomas Vincent <[email protected]>
…th SECURITY.md

a4016c1 removed the command_policy shell-metacharacter guard. The idiom
doc still described the guard as live. Replace the paragraph with the
actual current behavior: spine trusts command strings coming from the
Cacti database (script-trust model in SECURITY.md) and only scrubs the
dynamic-linker hijack vectors from the child environment.

Signed-off-by: Thomas Vincent <[email protected]>
GetTickCount wraps every 49.7 days. Using GetTickCount64 removes that
edge case so a long-uptime Windows host does not emit a tiny tick value
right after the wrap. The payload field stays uint32; we just truncate
a wider counter.

Signed-off-by: Thomas Vincent <[email protected]>
C17 makes <stdlib.h> and <string.h> unconditionally available, so the
autoconf-style STDC_HEADERS gate has no job to do. Remove the CMake
definition and the paired #if/#elif in common.h.

Signed-off-by: Thomas Vincent <[email protected]>
spine_sd_status silently truncates at 511 bytes. Call that out above the
function so a future caller does not hunt for the limit.

ping_init falls back to time^pid when getrandom or /dev/urandom refuses.
Add a SPINE_LOG_DEBUG so operators investigating seccomp filters or
early-boot entropy starvation can see the degraded path.

Signed-off-by: Thomas Vincent <[email protected]>
Audited README.md against the source tree and fixed inaccuracies:

- USDT probes: describe Linux-only (sys/sdt.h) compilation; list the
  real probe set (poll_start, poll_done, snmp_query) instead of the
  fictional "poll cycle start/end, SNMP request/response, circuit-
  breaker state changes" set.
- Config mode: util.c warns on world-readable and only refuses on
  group/world-writable; the previous "refuses otherwise" wording
  overstated the check.
- Sandboxing: pledge/unveil and PR_SET_NO_NEW_PRIVS are gated by the
  opt-in SPINE_SANDBOX env var and run in the main process before the
  poll loop, not "on the poller worker"; note that a full in-process
  seccomp-bpf allowlist is deferred.
- Remove claim that poll commands are rejected on shell metacharacters;
  that guard was reverted in a4016c1 and no replacement exists.

Signed-off-by: Thomas Vincent <[email protected]>
Poison environ with LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_*,
BASH_ENV, and ENV, then assert the filtered array contains none of them
while PATH, IFS, and an unrelated sentinel survive. A second case strips
PATH entirely and checks that the default-PATH injection fires.

spine_build_child_env lives behind common.h + spine.h (mysql + net-snmp),
so the test compiles a stand-alone copy of the function in
build_child_env_tu.c. The TU must stay in sync with nft_popen.c; both
sides note that in their file headers.

Signed-off-by: Thomas Vincent <[email protected]>
Windows-only CMake target that spawns 8 threads each calling
spine_icmp_echo_v4 against 127.0.0.1 four times. First thread through
the InterlockedCompareExchange gate runs the iphlpapi.dll load; the
losers spin through the acquire-fenced path added by the H1 fix. A
clean run does not prove correctness on ARM64 (weakly-ordered hw is
nondeterministic), but a crash or NULL deref proves the fence is
broken, which is the property we care about in CI.

Signed-off-by: Thomas Vincent <[email protected]>
Replaces the prior PR_SET_NO_NEW_PRIVS-only stub with a three-layer
confinement stack:

* PR_SET_DUMPABLE=0 applied at main() entry and again at sandbox
  activation to deny ptrace and suppress core dumps while DB credentials
  and SNMP community strings live in the heap.
* Landlock (kernel 5.13+) path-beneath ruleset covering log/pid dirs,
  the Cacti scripts tree, and a read-only system-root set. ENOSYS and
  EOPNOTSUPP are treated as a silent skip so older kernels still boot.
* libseccomp-bpf allowlist built from an strace of a local+remote poll
  cycle against MariaDB 10.11 and net-snmp 5.9. Missing syscalls in the
  list fail the rule_add quietly; operators can force-disable the
  filter with SPINE_NO_SECCOMP=1 or landlock with SPINE_NO_LANDLOCK=1.

CMake gains WITH_SECCOMP/WITH_LANDLOCK/WITH_AUDIT options and detection
blocks. All three libraries are soft dependencies: the build still
produces a working spine on systems that lack them.

Signed-off-by: Thomas Vincent <[email protected]>
mlockall(MCL_CURRENT|MCL_FUTURE) keeps DB passwords and the working set
off any swap-backed hibernation image. Gated behind --mlock so default
runs still succeed on systems with a tight RLIMIT_MEMLOCK. EPERM emits
a WARNING pointing at systemd LimitMEMLOCK=infinity so operators know
the knob to adjust instead of running unprotected.

Also applies PR_SET_DUMPABLE=0 at main() entry, shrinking the window
between process start and sandbox activation during which a ptrace
attach could scrape the freshly-parsed spine.conf secrets.

Signed-off-by: Thomas Vincent <[email protected]>
LimitCORE=0 blocks core dumps (defense in depth alongside in-process
PR_SET_DUMPABLE=0). LimitMEMLOCK=infinity lets operators use --mlock
without tuning system-wide limits. LimitAS caps total address space
at 4 GiB so a runaway worker can't force the host into swap death;
large pollers should raise it.

Signed-off-by: Thomas Vincent <[email protected]>
BREAKING: DB_UseSSL and RDB_UseSSL default to 1 (preferred) instead of 0.
A spine binary talking to a TLS-capable MySQL/MariaDB server now
negotiates an encrypted channel without operator action.

The option becomes tri-state:
  0 = plaintext (explicit opt-out; former default)
  1 = preferred (default; negotiate TLS if server offers it)
  2 = verify_identity (require TLS and verify hostname against CA)

Previously the code treated any non-zero value as VERIFY_IDENTITY, which
made it impossible to ask for best-effort TLS. The new middle tier closes
that gap without forcing CA bundle distribution on every poller.

Deployments that cannot reach a TLS-capable server (legacy MySQL, plain
TCP inside a trusted L2 segment) must set DB_UseSSL=0 explicitly.

Signed-off-by: Thomas Vincent <[email protected]>
Ships an apparmor profile covering spine's typical footprint: CAP_NET_RAW
for ICMP, the Cacti scripts and resource trees, the log and pid
directories that match the systemd unit's ReadWritePaths, and an
openssl/mysql abstraction include for the DB client library. Distro
packagers copy the file into /etc/apparmor.d/ and enforce via
apparmor_parser.

The CMake install step lands the profile under ${datadir}/spine/apparmor
so rpm/dpkg ownership stays with the distro MAC policy package rather
than with spine itself.

Signed-off-by: Thomas Vincent <[email protected]>
Ships a minimal refpolicy-style module (spine.te, spine.fc, spine.if)
declaring the spine_t domain, entry point, and log/pid/config file
contexts. The module is a skeleton: loading in enforcing mode without
the audit2allow pass documented in docs/security-selinux.md will deny
most real work. This is intentional. The Cacti scripts tree location,
MySQL access path, and SNMP layout vary by site and a prescriptive
policy would collide more often than it helps.

CMake lands the .te/.fc/.if files under ${datadir}/spine/selinux so
distro packagers can build the .pp at package time. AppArmor install
rule updated to land under ${datadir}/spine/apparmor with the same
rationale.

Signed-off-by: Thomas Vincent <[email protected]>
Documents GrapheneOS hardened_malloc as the preferred drop-in malloc
replacement for spine, the LD_PRELOAD systemd drop-in pattern, RSS
implications, and alternates (mimalloc on musl, Scudo for Clang builds).
No upstream code changes; hardened_malloc is a deployment-time choice.

Signed-off-by: Thomas Vincent <[email protected]>
Emit AUDIT_USER_CMD records via libaudit for SIGTERM graceful stop,
SIGHUP config reload, and per-host circuit breaker trips. Compiles to
no-ops on macOS/BSD/Linux-without-audit-libs. Netlink fd is cached
for the process lifetime to amortise setup cost.

Signed-off-by: Thomas Vincent <[email protected]>
Adds a thin libaudit wrapper (spine_audit_event) that emits a
USER_CMD record with op=spine-* prefix so auditd filter rules can route
spine events to a dedicated audit pipe. Events wired up at:

  * SIGHUP reload success/failure
  * SIGTERM graceful stop
  * Per-host circuit breaker trip (device id + skip cycles in detail)

libaudit is a soft dependency: WITH_AUDIT=OFF or absent audit-libs
compiles spine_audit_event to a no-op so non-Linux and audit-less
Linux builds keep working.

Signed-off-by: Thomas Vincent <[email protected]>
Copy link
Copy Markdown
Contributor Author

@somethingwithproof somethingwithproof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Security Analysis Summary

This PR significantly enhances Spine's security posture by introducing platform-specific sandboxing (pledge, unveil, PR_SET_NO_NEW_PRIVS) and a comprehensive multi-distro CI matrix. The automation infrastructure is well-structured and follows modern CI/CD best practices.

🔍 General Feedback

  • Sandbox Integration: Excellent addition of spine_sandbox_* hooks. The OpenBSD implementation correctly uses unveil to restrict filesystem access and pledge to narrow the syscall surface.
  • Linux Hardening: Using PR_SET_NO_NEW_PRIVS on Linux is a great low-cost win to prevent privilege escalation via execve.
  • CI Hygiene: Workflows use pinned SHAs for actions and limited GITHUB_TOKEN permissions (contents: read).
  • Matrix Coverage: The Docker-based distro matrix ensures consistent security behavior across diverse toolchains (glibc vs musl, different GCC versions).

Detailed Findings

  • LOW: Command Injection Risk in Developer Scripts (scripts/test-distros.sh, scripts/test-workflows.sh)

    • These scripts pass command-line arguments directly to shell execution points (docker run ... sh -c and act -j).
    • While these are local developer tools, they could be exploited if a developer is tricked into running them with malicious arguments.
    • Recommendation: Consider simple validation or quoting for arguments used in shell contexts.
  • LOW: Supply Chain Risk (GitHub Workflows)

    • The linux job in distro-matrix.yml uses distro tags (e.g., rockylinux:9) without SHA-256 digests.
    • Recommendation: For maximum security, pin container images by digest to prevent poisoning of the base image.

@somethingwithproof somethingwithproof changed the title ci: add Docker-based multi-distro build matrix feat: cross-platform CI matrix, sandbox hardening, and operator tooling Apr 15, 2026
…tate"

This reverts commit 2110794.

Signed-off-by: Thomas Vincent <[email protected]>
Reject non-whitelisted characters in the install-apt-deps input and
pass through an env var rather than ${{ inputs.packages }} inline.
Addresses CodeQL yaml.github-actions.security.run-shell-injection.

Signed-off-by: Thomas Vincent <[email protected]>
nft_popen.c: free cur on the final error exit before returning -1.
poller.c: free error_string, buf_size, buf_errors when either the
local or remote DB connection can't be acquired.
ping.c: drop the dead retry-bump branch that cppcheck flagged; the
enclosing time check already made it unreachable. Clarify the
surrounding comment.

Signed-off-by: Thomas Vincent <[email protected]>
Introduce ASSERT_FAIL(msg) in test_platform_helpers.h so tests don't
rely on ASSERT_TRUE(0 && "msg"), which cppcheck flags as
incorrectStringBooleanError. Add explicit null-pointer guards in
test_build_fixes, test_platform_env, and test_platform_error where
the test already asserts non-null but then dereferences, which
cppcheck's null-check analysis treats as redundant-or-deref.

Signed-off-by: Thomas Vincent <[email protected]>
Comment thread src/poller.c Fixed
Comment thread src/platform/platform_win.c Fixed
Comment thread src/poller.c Fixed
Comment thread src/poller.c Fixed
Comment thread src/poller.c Fixed
Comment thread src/ping.c Fixed
Comment thread src/php.c Fixed
Comment thread src/circuit_breaker.c Fixed
Comment thread src/circuit_breaker.c Fixed
Comment thread src/circuit_breaker.c Fixed
161 cppcheck NOTE-severity style alerts batch-dismissed. Error alert 181 (ctuuninitvar on spine_platform_localtime output parameter) and warnings 104/105 (invalidScanfFormatWidth_smaller on intentionally bounded sscanf widths) dismissed as false positives.

Signed-off-by: Thomas Vincent <[email protected]>
- Validate distro and image names in test-distros.sh and test-workflows.sh
- Ensure job names passed to act are restricted to safe characters
…y changes

- Document new Script_Policy setting in spine.conf.dist and README
- Clarify opt-in nature of shell-metacharacter guard in INSTALL
- Document PHP protocol safety (newline rejection)
- Update OpenBSD sandbox status in platforms.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants