Commit 5be7bb1
feat: migrate to uv + add ContainerGrader for Kubernetes/Docker sandboxed grading (#14)
* chore: migrate from pip-compile to uv for dependency management
- Run migrate-to-uv to bootstrap pyproject.toml from requirements/base.txt
and requirements/test.txt
- Add full project metadata: name, version, description, requires-python>=3.11,
license, hatchling build backend, entry point xqueue-watcher -> manager:main
- Add newrelic as [project.optional-dependencies.production]
- Add dev dependency group: coverage, mock, pytest-cov
- Remove setup.py (replaced by pyproject.toml)
- Remove all requirements/*.in and requirements/*.txt files (14 files)
- Generate uv.lock with pinned dependency graph
- Update Makefile: replace pip/pip-compile targets with uv sync / uv run pytest
- Update .github/workflows/ci.yml: use astral-sh/setup-uv@v4, drop ubuntu-20.04
and Python 3.8, add Python 3.13, update to actions/checkout@v4
- Replace upgrade-python-requirements workflow with uv lock --upgrade +
create-pull-request workflow
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: remove AppArmor/codejail hard dependency; make codejail optional
- Remove six (Python 2 compat shim) from imports and SUPPORT_FILES in
jailedgrader.py — Python 3 only going forward
- Wrap codejail imports in try/except in jailedgrader.py and manager.py;
raise RuntimeError with clear message directing users to ContainerGrader
- Fix Path.abspath() -> Path.absolute() (breaking API change in path v17)
in grader.py and jailedgrader.py
- Update Dockerfile: ubuntu:xenial -> python:3.11-slim, remove apparmor
and language-pack-en packages, fix layer ordering
- Update test_codejail_config to use fork_per_item=False to avoid
multiprocessing state-inheritance failure on Python 3.14 forkserver default
- Update conf.d/600.json example to use ContainerGrader handler
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add ContainerGrader for Kubernetes/Docker-based sandboxed grading
Adds xqueue_watcher/containergrader.py — a drop-in replacement for
JailedGrader that executes student code inside an isolated container
instead of using AppArmor/codejail.
Security model (replaces AppArmor):
- Container isolation (Linux namespaces + cgroups)
- Non-root user (UID 1000), read-only root filesystem
- CPU/memory resource limits enforced by container runtime
- Network disabled for grader containers (no egress)
- Hard wall-clock timeout via activeDeadlineSeconds (k8s) or timeout (Docker)
Two pluggable backends selected via the 'backend' KWARGS option:
kubernetes (default / production):
- Creates a batch/v1 Job per submission using the kubernetes Python client
- Auto-detects in-cluster vs kubeconfig credentials
- Polls until Job completes, collects stdout from pod logs
- Deletes the Job after result collection (ttlSecondsAfterFinished=300)
- Job pod spec includes: securityContext, resource limits,
activeDeadlineSeconds, and labels for observability
docker (local dev / CI):
- Runs a container via the docker Python SDK
- Bind-mounts the grader directory read-only
- Passes SUBMISSION_CODE as an environment variable
- Network disabled, memory + CPU limits applied
Student code is passed via SUBMISSION_CODE env var (avoids argv length
limits and shell injection). The entrypoint writes it to /tmp before
invoking grader_support.run, producing the same JSON output format that
JailedGrader already expects — so no changes to grader test framework
or course team grader code are required.
Configuration example (conf.d/my-course.json):
{
"my-course": {
"HANDLERS": [{
"HANDLER": "xqueue_watcher.containergrader.ContainerGrader",
"KWARGS": {
"grader_root": "/graders/my-course/",
"image": "registry.example.com/my-course:latest",
"backend": "kubernetes",
"cpu_limit": "500m",
"memory_limit": "256Mi",
"timeout": 20
}
}]
}
}
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add grader base Docker image and container entrypoint
grader_support/Dockerfile.base:
- python:3.11-slim base, non-root grader user (UID 1000)
- Copies grader_support framework; installs path-py
- ENTRYPOINT: python -m grader_support.entrypoint
- /tmp volume for submission files (writable even with read-only root fs)
- Course teams extend this image to add their deps and grader scripts
grader_support/entrypoint.py:
- Reads SUBMISSION_CODE env var, writes to /tmp/submission.py
- Adds /tmp and cwd to sys.path, then delegates to grader_support.run
- Prints JSON result to stdout (same schema JailedGrader already parses)
grader_support/README.md:
- Course team authoring guide: how to extend the base image, configure
the handler, and understand the security properties
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add Kubernetes deployment manifests and Docker Compose local dev
deploy/kubernetes/ (Kustomize-compatible):
- serviceaccount.yaml — dedicated SA for xqueue-watcher pods
- rbac.yaml — Role + RoleBinding: create/delete Jobs, read pod logs
- configmap.yaml — watcher xqwatcher.json config (edit for your queues)
- deployment.yaml — 2 replicas, topologySpreadConstraints, securityContext,
resource limits, readinessProbe
- networkpolicy.yaml — deny all ingress/egress on grader Job pods (label:
role=grader-job); allow xqueue-watcher egress to xqueue
- secret.yaml.template — placeholder: copy to secret.yaml, fill in credentials,
do not commit secret.yaml (added to .gitignore)
- kustomization.yaml — Kustomize entry point for the base directory
docker-compose.yml (local dev):
- xqueue-watcher container with docker socket access (for docker backend)
- Mounts conf.d/ and grader directories
- Includes a sample xqueue service reference for full local stack
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: correct grader path handling in ContainerGrader and entrypoint
ContainerGrader had two bugs affecting how grader files were located
inside the container at runtime:
1. Docker backend bind-mounted the grader problem directory at /grader,
overwriting the grader_support package that the base image copies
there. Fixed by binding at /graders instead and passing the
resulting absolute in-container path (/graders/<file>) to the
entrypoint.
2. Kubernetes backend set working_dir to the grader problem directory
(e.g. /graders/ps07/Robot/), preventing Python from finding the
grader_support package which lives at /grader/grader_support/.
Fixed by keeping working_dir=/grader (the base image WORKDIR) and
passing the absolute grader path in args instead of just the
basename.
entrypoint.py previously passed the full absolute path verbatim to
__import__(), which fails for paths containing slashes. It now detects
absolute paths, inserts the parent directory into sys.path, and uses
only the basename as the importable module name.
Also updates grader_support/README.md to document the correct layout
(/graders/ for course grader scripts, /grader/ for grader_support) and
the gradelib compatibility note for course teams migrating from
Python 2 graders.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(tests): skip jailed grader tests when codejail is not installed
codejail is an optional dependency (not installed in CI). Guard the
import with a try/except and apply @pytest.mark.skipif to the test
class so collection succeeds and tests are skipped gracefully when
codejail is absent.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: address PR review feedback
- Dockerfile: replace deleted requirements/ pip install with uv sync
(copies uv binary from ghcr.io/astral-sh/uv and uses uv sync --frozen)
- grader.py: guard against path traversal in grader_config['grader'];
validate that the resolved grader path stays within grader_root
- containergrader.py: fix Docker SDK TypeError - containers.run() does
not accept a timeout kwarg; switch to detach=True + container.wait()
to enforce the timeout, then collect logs and remove the container
- containergrader.py: remove brittle hardcoded line numbers (L364,
L379, L397, L450) from user-facing error messages
- docker-compose.yml: change conf.d and data volumes from :ro to :rw
so local edits take effect without rebuild (matches comment intent)
- upgrade-python-requirements.yml: add explicit permissions block
(contents: write, pull-requests: write) as required by security policy
- Automated code Graders With xqueue-watcher.md: remove empty heading,
add 'Property' header to comparison table
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* refactor: replace path-py with stdlib pathlib
path-py is an external dependency that wraps pathlib with a fluent API.
Since we now require Python >= 3.11, pathlib covers all the same
functionality without an extra dependency.
Changes:
- Replace 'from path import Path' with 'from pathlib import Path' in all
source and test files
- .dirname() → .parent
- .basename() → .name
- .absolute() / .absolute() → .resolve() (symlink-safe)
- .files('*.json') → .glob('*.json') (with sorted() for stable ordering)
- Remove path-py (path-py / path) from pyproject.toml dependencies
- Regenerate uv.lock (removes path==17.1.1 and path-py==12.5.0)
- Simplify grader.py path-traversal check: now that grader_path is a
native pathlib.Path, the inline 'import pathlib' is no longer needed
- Fix test_grader.py mock: grader_path.endswith() → grader_path.name ==
- Fix test_manager.py: pass str() to argparse (Path is not subscriptable)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add edx-codejail as optional dependency; document container isolation decision
Add edx-codejail (the upstream PyPI package, v4.1.0) as an optional
'codejail' extra, replacing the previously pinned git-URL reference to
a specific commit.
uv add --optional codejail edx-codejail
codejail is intentionally excluded from the base Docker image because
ContainerGrader uses container-level isolation (Linux namespaces,
cgroups, capability dropping, network isolation, read-only filesystem)
which provides equivalent sandboxing to AppArmor without requiring
host-level AppArmor configuration that is unavailable inside Kubernetes
pods. Install the 'codejail' extra only when using the legacy
JailedGrader on a bare-metal or VM host with AppArmor configured.
To use: uv sync --extra codejail
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: address second round of PR review feedback
- Makefile: fix tab indentation on all recipe lines (was space-indented)
- grader.py: remove unused sys import
- jailedgrader.py: replace deprecated load_module() with spec_from_file_location/exec_module
- containergrader.py:
- remove unused imports (logging, os, tempfile) and _JOB_LABEL constant
- add emptyDir volume at /tmp in K8s Job spec (required when read_only_root_filesystem=True)
- add clarifying comment that K8s grader scripts are baked into the course image
- replace deprecated load_module() with importlib.util spec/exec_module pattern
- capture stderr from Docker container on non-zero exit for better diagnostics
- grader_support/entrypoint.py: correct misleading comment about /tmp writability
- deploy/kubernetes/deployment.yaml: fix command to use xqueue-watcher entry point
- deploy/kubernetes/configmap.yaml: add xqueue-watcher-queue-configs ConfigMap so
manifests apply cleanly out of the box
- docker-compose.yml: mount Docker socket for docker backend to work
- conf.d/600.json: use absolute /graders/ path instead of relative ../data path
- Dockerfile: use C.UTF-8 locale (available without installing locales package)
- pyproject.toml: add edx-codejail to dev group so jailed grader tests run in CI
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* refactor: move full grading pipeline into container; add ContainerGrader unit tests
Architecture change: grader scripts are baked into the course-specific Docker
image, so the watcher pod has no need to access grader files locally. The
grader_support entrypoint now runs the complete grading pipeline inside the
container (load grader, preprocess, run answer + submission, compare, return
JSON grade), and ContainerGrader.grade() is simplified to just launch the
container and parse its JSON output.
Changes:
- grader_support/entrypoint.py: complete rewrite; now takes GRADER_FILE SEED
(not GRADER_FILE submission.py SEED); runs full grade pipeline in container;
reads GRADER_LANGUAGE and HIDE_OUTPUT env vars from ContainerGrader
- xqueue_watcher/containergrader.py:
- Remove grader-module loading, gettext, answer.py reading, and all test-
comparison logic from grade() — the container handles this now
- grade() now just calls _run() and parses the returned JSON
- _run() accepts grader_config and forwards lang/hide_output as env vars
- _build_k8s_job(): args are now [grader_abs, seed] (not 3 args), adds
GRADER_LANGUAGE and HIDE_OUTPUT env vars, still mounts emptyDir at /tmp
- _run_docker(): same arg change; passes GRADER_LANGUAGE and HIDE_OUTPUT
- ReadTimeout from container.wait() caught and re-raised as clear RuntimeError
- Remove unused _truncate, _prepend_coding, importlib.util
- tests/test_container_grader.py: 36 new unit tests covering:
- _parse_cpu_millis
- ContainerGrader init / backend validation
- _build_k8s_job: args, env vars, resource limits, emptyDir/tmp, security
- _run_docker: success, non-zero exit (with stderr), timeout, missing SDK
- grade(): skip_grader, successful result, container failure, size warning
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* refactor: replace statsd/newrelic with OpenTelemetry; add 12-factor settings
- Remove dogstatsd-python dependency; replace statsd instrumentation in
grader.py with OpenTelemetry counters and a histogram
- Add xqueue_watcher/metrics.py: configure_metrics() wires a MeterProvider
with an OTLP HTTP exporter when OTEL_EXPORTER_OTLP_ENDPOINT is set;
all four instruments (process_item, grader_payload_error, grading_time,
replies) defined at module level against the global proxy meter
- Call configure_metrics() from Manager.configure_from_directory() so the
real provider is installed before any submissions are processed
- Add xqueue_watcher/env_settings.py: get_manager_config_from_env() reads
all manager config from XQWATCHER_* environment variables, compatible
with 12-factor / Kubernetes deployment patterns
- Remove newrelic from the production optional-dependency group and from
the edx.org Dockerfile stage; the stage now runs xqueue-watcher directly
- Add opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-otlp-proto-http
to core dependencies; regenerate uv.lock
- Add tests/test_env_settings.py and tests/test_metrics.py
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: remove planning doc from git tracking
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: remove codecov upload from CI
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: address PR #14 review feedback
- docker-compose.yml: remove unused GRADER_BACKEND env var, fix duplicate
volumes key by merging into one list, tag sample-grader with
image: grader-base:local so conf.d/600.json reference resolves
- Dockerfile: standardise CMD config path to /etc/xqueue-watcher to match
docker-compose and Kubernetes manifests
- metrics.py: remove OTEL_METRIC_EXPORT_INTERVAL from docstring since it is
not wired up in _build_meter_provider()
- containergrader.py: add pod template metadata labels so the NetworkPolicy
podSelector (app.kubernetes.io/component=xqueue-grader) actually matches
grading pods; set automount_service_account_token=False on the grading pod
spec to reduce blast radius if the NetworkPolicy is misconfigured; add
_parse_memory_bytes() helper and use it for the Docker backend mem_limit
so Kubernetes-style strings like '256Mi' are converted to bytes rather
than passed raw (which Docker does not accept)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: add venv bin to PATH so xqueue-watcher entrypoint resolves
uv installs the console script into the project virtual environment at
.venv/bin/xqueue-watcher. Without adding this directory to PATH the CMD
cannot be found at container startup.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add configure_logging() for 12-factor stdout logging
When no logging.json file is present, manager.py now calls
configure_logging() from env_settings instead of basicConfig().
configure_logging() sets up a single StreamHandler on stdout with a
consistent timestamp/level/module format, honours XQWATCHER_LOG_LEVEL
(default INFO), and suppresses noisy requests/urllib3 debug output.
This removes the need for a logging.json file in Kubernetes deployments.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: symlink xqueue-watcher into /usr/local/bin for reliable resolution
Using PATH via ENV is fragile -- container runtimes and security policies
can reset or ignore it. Install a symlink at /usr/local/bin/xqueue-watcher
(always in the standard system PATH) so the entrypoint resolves regardless
of how the container is launched. Also remove the stale NEW_RELIC_LICENSE_KEY
env entry from the Kubernetes deployment manifest.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add env-based defaults for ContainerGrader configuration
Add get_container_grader_defaults() to env_settings, reading five new
XQWATCHER_GRADER_* env vars:
XQWATCHER_GRADER_BACKEND (default: kubernetes)
XQWATCHER_GRADER_NAMESPACE (default: default)
XQWATCHER_GRADER_CPU_LIMIT (default: 500m)
XQWATCHER_GRADER_MEMORY_LIMIT (default: 256Mi)
XQWATCHER_GRADER_TIMEOUT (default: 20)
ContainerGrader.__init__ now uses None sentinels for these params so that
any value omitted from a conf.d KWARGS block falls back to the env-derived
default rather than a hardcoded constant. Values supplied explicitly in
conf.d always take precedence, preserving backwards compatibility.
Also fixes duplicate function definitions that had crept into env_settings.py.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(containergrader): add ImageDigestPoller and image pull policy support
- Add ImageDigestPoller class: background daemon thread that periodically
resolves a tag-based image reference to its current digest via
docker.APIClient.inspect_distribution(). Thread-safe; falls back to the
original reference if resolution fails.
- Add image_pull_policy param to ContainerGrader (auto-detect: IfNotPresent
for digest refs, Always for tag-based refs; can be overridden explicitly).
- Add poll_image_digest and digest_poll_interval params to activate the
poller. When enabled, Kubernetes Jobs use the most recently resolved
repo@sha256:… reference via _effective_image(), ensuring nodes always run
the latest pushed image without relying on imagePullPolicy: Always for
every pod.
- Add .github/workflows/publish-grader-base-image.yml to build and push
grader_support/Dockerfile.base to ghcr.io/mitodl/xqueue-watcher-grader-base
on push to master (grader_support/** paths), weekly schedule, and
workflow_dispatch. Multi-platform linux/amd64,linux/arm64.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: normalise imagePullPolicy to title-case before K8s API call
Kubernetes requires imagePullPolicy to be exactly 'Always', 'IfNotPresent',
or 'Never' (case-sensitive). When the value is supplied via KWARGS in the
conf.d JSON (e.g. 'always' or 'ALWAYS'), the K8s API returns 422 Unprocessable
Entity.
Add a normalisation dict lookup that maps the lowercased input back to the
canonical title-case form. Unknown values are passed through unchanged so
Kubernetes can surface the validation error with a clear message.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add strip_path_components to ContainerGrader for legacy path prefixes
LMS grader_payload 'grader' fields configured against the old git-clone
deployment include a queue-name prefix, e.g.:
mit-600x-Watcher-MITX-6.0001r/graders/python3graders/chips1/.../grade.py
In the containerized approach, graders are baked directly into the image
at grader_root, so the path resolves to:
/graders/mit-600x-Watcher-MITX-6.0001r/graders/python3graders/...
which doesn't exist. The actual path in the image is:
/graders/python3graders/...
Add strip_path_components (int, default 0) KWARG to ContainerGrader.
When > 0, that many leading path components are stripped from the
grader path (relative to grader_root) before it is passed as the
container entrypoint argument. Set to 2 to remove both the queue-name
component and the redundant repo subdirectory name.
Example KWARGS:
"strip_path_components": 2
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: install gettext into builtins before loading grader module
Grader scripts may call _() at module level (e.g. in input_validators
defined at import time). The previous code installed trans.install()
after exec_module, causing NameError: name '_' is not defined.
Move the entire locale/gettext setup block to before exec_module so
_ is available in builtins when the grader script is first executed.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: normalize mixed tab/space indentation before exec
Python 3 raises TabError when exec'ing code with mixed tabs and spaces
in the same indented block. Many course grader answer.py files were
authored for Python 2 which tolerated this.
Call expandtabs(4) on both the staff answer and student submission
before preprocessing and writing to /tmp, so exec never sees raw tabs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: sys.path ordering so preprocessed answer/submission shadow originals
run.py's import_captured() uses __import__() to load answer and
submission modules. grader_dir was inserted into sys.path AFTER /tmp,
making it position 0, so __import__('answer') found the original
/graders/.../answer.py (with bare 'for c in s:') instead of the
preprocessed /tmp/answer.py (with 'submission_code = repr(...)').
Fix: insert grader_dir first, then /tmp, so /tmp is position 0 and
the preprocessed files always shadow the originals.
Also:
- Add _dbg() helper for debug tracing behind GRADER_DEBUG=1 env var;
off by default so stderr output doesn't corrupt the JSON pod log
that containergrader.py reads via read_namespaced_pod_log.
- Import traceback (used by _dbg exception paths).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: log raw container output bytes on JSON parse failure
Add an explicit ERROR-level log of the raw bytes (repr, up to 4096)
when json.loads fails so we can see exactly what the pod log contains,
including any leading/trailing garbage from stderr that Kubernetes
combines into the pod log stream.
Also add a DEBUG-level log of every container output for tracing.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: push grader base image to DockerHub as well as GHCR
Concourse grader-image pipelines use DockerHub as the trigger source.
The workflow previously only pushed to GHCR, so Concourse never saw
updates to the base image.
Changes:
- Add DockerHub login step (DOCKERHUB_USERNAME/DOCKERHUB_PASSWORD secrets)
- Push to both mitodl/xqueue-watcher-grader-base (DockerHub) and
ghcr.io/mitodl/xqueue-watcher-grader-base (GHCR)
- Tag :latest on feature branches during active development so Concourse
picks up fixes without waiting for master merge
- Add feature branches to push trigger so grader_support fixes are
published immediately
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: request stdout-only stream from read_namespaced_pod_log
The default stream parameter is 'All', which interleaves stderr into
the returned string. Any stderr output from the container (Python
warnings, import messages, etc.) corrupts the JSON that the entrypoint
prints to stdout, causing JSONDecodeError in the watcher.
Pass stream='Stdout' and container='grader' explicitly so only the
container's stdout is returned.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: extract last line from pod log instead of using stream param
The PodLogsQuery feature gate (which enables the 'stream' field in
PodLogOptions) is opt-in and disabled on the target cluster. Using
stream= returns a 422 FieldValueForbidden error even on K8s 1.35.
Instead, fetch the combined stdout+stderr log and scan backwards for
the last non-empty line. The entrypoint always prints exactly one JSON
object as its final output line, so this reliably extracts the result
regardless of any stderr noise interleaved earlier in the log.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: bypass kubernetes client JSON deserialisation of pod logs
read_namespaced_pod_log returns response_type='str'. The client's
deserialize() method first calls json.loads() on the raw response body
(succeeds since the entrypoint outputs valid JSON), then passes the
resulting Python dict to __deserialize_primitive(dict, str) which calls
str(dict) — producing Python repr with single-quoted keys and True/False
booleans, which is not valid JSON.
Fix: pass _preload_content=False to get the raw urllib3 response object
and read .data directly as bytes, bypassing the client deserialisation
entirely. The raw bytes are valid UTF-8 JSON as printed by the entrypoint.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: add top-level permissions: {} to restrict default GITHUB_TOKEN scope
Addresses GitHub Advanced Security finding: 'Workflow does not contain
permissions'. Adding a workflow-level permissions: {} block ensures the
GITHUB_TOKEN has no default permissions; each job must explicitly declare
what it needs. The update-dependencies job retains its required
contents: write and pull-requests: write grants.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* refactor: remove strip_path_components from ContainerGrader
strip_path_components was added to work around what turned out to be a
configuration error in the LMS grader_payload, not a structural problem
in the grading path resolution. Remove the parameter, its __init__
assignment, the stripping logic in _build_k8s_job, and all docstring
references to keep the code simple and correct.
Also addressed in this commit:
- grader.py: downgrade per-submission grading-time log from INFO to DEBUG
to avoid high-volume noise in production log streams
- Dockerfile: pin uv to 0.10.7 via a named build stage instead of
floating ghcr.io/astral-sh/uv:latest; replace the xqueue-watcher
symlink with ENV PATH so the full venv is on PATH
- env_settings.py: add XQWATCHER_DOCKER_HOST_GRADER_ROOT env var
(preparation for docker_host_grader_root ContainerGrader param)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: add docker_host_grader_root; drop path-py from grader base image
ContainerGrader (Docker backend): add docker_host_grader_root parameter
so that when xqueue-watcher runs inside a container the bind-mount source
path can be translated from the watcher-container path to the equivalent
host-side path. Without this the Docker daemon (reached via the mounted
socket) would look for the grader directory on the host where it does not
exist. Defaults to XQWATCHER_DOCKER_HOST_GRADER_ROOT env var or None
(watcher runs directly on the host, no translation needed).
docker-compose.yml: add XQWATCHER_DOCKER_HOST_GRADER_ROOT placeholder
and explanatory comment so operators know to set the absolute host path.
grader_support/Dockerfile.base: remove the path-py pip install. The
grader_support framework itself does not import path; course teams that
need path-py can add it in their own downstream image.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add named xqueue server references via xqueue_servers.json
Queue configs in conf.d can now use SERVER_REF to reference a named
server defined in xqueue_servers.json, avoiding the need to embed
XQueue URLs and credentials directly in grader configuration files.
- settings.py: add get_xqueue_servers() to load and validate
xqueue_servers.json from the config root
- manager.py: load xqueue_servers.json in configure_from_directory();
resolve SERVER_REF in client_from_config(), raising ValueError for
unknown names or conflicts with SERVER/AUTH
- env_settings.py: document the Kubernetes Secret volume-mount pattern
for xqueue_servers.json as the preferred credentials delivery mechanism
- conf.d/600.json: update example to use SERVER_REF
- tests: add ServerRefTests and TestGetXqueueServers covering resolution,
error cases, and configure_from_directory integration
- tests/fixtures/config/xqueue_servers.json: fixture server for tests
- README.md: document SERVER_REF, xqueue_servers.json format, and
Kubernetes Secret mounting pattern
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: remove DockerHub push from grader base image workflow
Only push to GHCR. Remove the DockerHub login step, DockerHub image
reference from the metadata action, and the DOCKERHUB_USERNAME /
DOCKERHUB_PASSWORD secret dependencies.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: harden containergrader and XQueue client
- Fix TLS certificate verification: replace hardcoded verify=False with
a _VERIFY_TLS flag (default True). Operators can opt out via
XQWATCHER_VERIFY_TLS=false for dev environments; a warning is logged
when verification is disabled.
- Remove credentials from logs: strip self.password from the debug login
message and the login-retry error message in client.py.
- Enforce hard submission size limit: reject submissions larger than
XQWATCHER_SUBMISSION_SIZE_LIMIT bytes (default 1 MB) before launching
a container. Prevents etcd object-size overflows and resource-exhaustion
attacks via very large env vars. Keep the existing 32 KB warning for
submissions that are large but within the limit.
- Add seccomp RuntimeDefault profile to Kubernetes grading Jobs: applied
at both the pod level (V1PodSecurityContext) and the container level
(V1SecurityContext) to restrict the available syscall surface.
- Add PID limit to grading container resource limits: caps the number of
processes a grading container may create at 256, preventing fork-bomb
attacks from affecting other node workloads.
- Cap /tmp emptyDir at 50 Mi: adds size_limit='50Mi' to the emptyDir
volume backing /tmp in grading pods, preventing disk-exhaustion attacks.
- Add path traversal pre-check in grader.py: explicitly reject grader
paths containing '..' components before Path.resolve() is called,
removing symlink edge-cases that could bypass the relative_to() guard.
- Update containergrader module docstring and env_settings docs to
accurately describe the security posture and new env vars.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: address PR #14 review feedback
- Makefile: add missing tab indentation on help target recipe lines
- grader_support/entrypoint.py: fix always-true EndTest check (use str(e).strip() not e is not None)
- tests/test_env_settings.py: use clear=True in hermetic default-value tests
- tests/test_metrics.py: use clear=True to prevent OTEL_ env vars bleeding in
- xqueue_watcher/client.py: apply _VERIFY_TLS in _request() and _login(), not just put_result
- xqueue_watcher/containergrader.py:
- fix image repo parsing to handle registry:port/image:tag refs (rfind approach)
- fix 'pods' → 'pids' container resource limit
- lazy-init Kubernetes API clients once per instance (avoids per-submission config load)
- xqueue_watcher/env_settings.py: parse HTTP_BASIC_AUTH into (username, password) tuple
- xqueue_watcher/metrics.py: clarify OTEL_RESOURCE_ATTRIBUTES is parsed by SDK automatically
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 857a9ea commit 5be7bb1
File tree
52 files changed
+3658
-463
lines changed- .github/workflows
- conf.d
- deploy/kubernetes
- grader_support
- load_test
- requirements
- tests
- fixtures/config
- xqueue_watcher
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
52 files changed
+3658
-463
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
24 | 25 | | |
25 | | - | |
| 26 | + | |
26 | 27 | | |
27 | | - | |
28 | | - | |
| 28 | + | |
| 29 | + | |
29 | 30 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
| 31 | + | |
| 32 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
| 7 | + | |
| 8 | + | |
12 | 9 | | |
13 | 10 | | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| 2 | + | |
2 | 3 | | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
| 4 | + | |
7 | 5 | | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
12 | 18 | | |
13 | 19 | | |
14 | | - | |
15 | | - | |
16 | 20 | | |
17 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
18 | 39 | | |
19 | | - | |
20 | 40 | | |
21 | 41 | | |
22 | | - | |
| 42 | + | |
23 | 43 | | |
24 | | - | |
25 | | - | |
26 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
3 | 1 | | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
30 | 12 | | |
31 | 13 | | |
32 | | - | |
| 14 | + | |
33 | 15 | | |
34 | | - | |
35 | | - | |
| 16 | + | |
| 17 | + | |
36 | 18 | | |
37 | | - | |
38 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
39 | 22 | | |
40 | | - | |
41 | | - | |
| 23 | + | |
| 24 | + | |
42 | 25 | | |
43 | 26 | | |
44 | 27 | | |
45 | 28 | | |
46 | | - | |
47 | | - | |
| 29 | + | |
0 commit comments