Skip to content

Commit 5be7bb1

Browse files
blarghmateyCopilot
andauthored
feat: migrate to uv + add ContainerGrader for Kubernetes/Docker sandboxed grading (#14)
* chore: migrate from pip-compile to uv for dependency management - Run migrate-to-uv to bootstrap pyproject.toml from requirements/base.txt and requirements/test.txt - Add full project metadata: name, version, description, requires-python>=3.11, license, hatchling build backend, entry point xqueue-watcher -> manager:main - Add newrelic as [project.optional-dependencies.production] - Add dev dependency group: coverage, mock, pytest-cov - Remove setup.py (replaced by pyproject.toml) - Remove all requirements/*.in and requirements/*.txt files (14 files) - Generate uv.lock with pinned dependency graph - Update Makefile: replace pip/pip-compile targets with uv sync / uv run pytest - Update .github/workflows/ci.yml: use astral-sh/setup-uv@v4, drop ubuntu-20.04 and Python 3.8, add Python 3.13, update to actions/checkout@v4 - Replace upgrade-python-requirements workflow with uv lock --upgrade + create-pull-request workflow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: remove AppArmor/codejail hard dependency; make codejail optional - Remove six (Python 2 compat shim) from imports and SUPPORT_FILES in jailedgrader.py — Python 3 only going forward - Wrap codejail imports in try/except in jailedgrader.py and manager.py; raise RuntimeError with clear message directing users to ContainerGrader - Fix Path.abspath() -> Path.absolute() (breaking API change in path v17) in grader.py and jailedgrader.py - Update Dockerfile: ubuntu:xenial -> python:3.11-slim, remove apparmor and language-pack-en packages, fix layer ordering - Update test_codejail_config to use fork_per_item=False to avoid multiprocessing state-inheritance failure on Python 3.14 forkserver default - Update conf.d/600.json example to use ContainerGrader handler Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add ContainerGrader for Kubernetes/Docker-based sandboxed grading Adds xqueue_watcher/containergrader.py — a drop-in replacement for JailedGrader that executes student code inside an isolated container instead of using AppArmor/codejail. Security model (replaces AppArmor): - Container isolation (Linux namespaces + cgroups) - Non-root user (UID 1000), read-only root filesystem - CPU/memory resource limits enforced by container runtime - Network disabled for grader containers (no egress) - Hard wall-clock timeout via activeDeadlineSeconds (k8s) or timeout (Docker) Two pluggable backends selected via the 'backend' KWARGS option: kubernetes (default / production): - Creates a batch/v1 Job per submission using the kubernetes Python client - Auto-detects in-cluster vs kubeconfig credentials - Polls until Job completes, collects stdout from pod logs - Deletes the Job after result collection (ttlSecondsAfterFinished=300) - Job pod spec includes: securityContext, resource limits, activeDeadlineSeconds, and labels for observability docker (local dev / CI): - Runs a container via the docker Python SDK - Bind-mounts the grader directory read-only - Passes SUBMISSION_CODE as an environment variable - Network disabled, memory + CPU limits applied Student code is passed via SUBMISSION_CODE env var (avoids argv length limits and shell injection). The entrypoint writes it to /tmp before invoking grader_support.run, producing the same JSON output format that JailedGrader already expects — so no changes to grader test framework or course team grader code are required. Configuration example (conf.d/my-course.json): { "my-course": { "HANDLERS": [{ "HANDLER": "xqueue_watcher.containergrader.ContainerGrader", "KWARGS": { "grader_root": "/graders/my-course/", "image": "registry.example.com/my-course:latest", "backend": "kubernetes", "cpu_limit": "500m", "memory_limit": "256Mi", "timeout": 20 } }] } } Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add grader base Docker image and container entrypoint grader_support/Dockerfile.base: - python:3.11-slim base, non-root grader user (UID 1000) - Copies grader_support framework; installs path-py - ENTRYPOINT: python -m grader_support.entrypoint - /tmp volume for submission files (writable even with read-only root fs) - Course teams extend this image to add their deps and grader scripts grader_support/entrypoint.py: - Reads SUBMISSION_CODE env var, writes to /tmp/submission.py - Adds /tmp and cwd to sys.path, then delegates to grader_support.run - Prints JSON result to stdout (same schema JailedGrader already parses) grader_support/README.md: - Course team authoring guide: how to extend the base image, configure the handler, and understand the security properties Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add Kubernetes deployment manifests and Docker Compose local dev deploy/kubernetes/ (Kustomize-compatible): - serviceaccount.yaml — dedicated SA for xqueue-watcher pods - rbac.yaml — Role + RoleBinding: create/delete Jobs, read pod logs - configmap.yaml — watcher xqwatcher.json config (edit for your queues) - deployment.yaml — 2 replicas, topologySpreadConstraints, securityContext, resource limits, readinessProbe - networkpolicy.yaml — deny all ingress/egress on grader Job pods (label: role=grader-job); allow xqueue-watcher egress to xqueue - secret.yaml.template — placeholder: copy to secret.yaml, fill in credentials, do not commit secret.yaml (added to .gitignore) - kustomization.yaml — Kustomize entry point for the base directory docker-compose.yml (local dev): - xqueue-watcher container with docker socket access (for docker backend) - Mounts conf.d/ and grader directories - Includes a sample xqueue service reference for full local stack Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: correct grader path handling in ContainerGrader and entrypoint ContainerGrader had two bugs affecting how grader files were located inside the container at runtime: 1. Docker backend bind-mounted the grader problem directory at /grader, overwriting the grader_support package that the base image copies there. Fixed by binding at /graders instead and passing the resulting absolute in-container path (/graders/<file>) to the entrypoint. 2. Kubernetes backend set working_dir to the grader problem directory (e.g. /graders/ps07/Robot/), preventing Python from finding the grader_support package which lives at /grader/grader_support/. Fixed by keeping working_dir=/grader (the base image WORKDIR) and passing the absolute grader path in args instead of just the basename. entrypoint.py previously passed the full absolute path verbatim to __import__(), which fails for paths containing slashes. It now detects absolute paths, inserts the parent directory into sys.path, and uses only the basename as the importable module name. Also updates grader_support/README.md to document the correct layout (/graders/ for course grader scripts, /grader/ for grader_support) and the gradelib compatibility note for course teams migrating from Python 2 graders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(tests): skip jailed grader tests when codejail is not installed codejail is an optional dependency (not installed in CI). Guard the import with a try/except and apply @pytest.mark.skipif to the test class so collection succeeds and tests are skipped gracefully when codejail is absent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address PR review feedback - Dockerfile: replace deleted requirements/ pip install with uv sync (copies uv binary from ghcr.io/astral-sh/uv and uses uv sync --frozen) - grader.py: guard against path traversal in grader_config['grader']; validate that the resolved grader path stays within grader_root - containergrader.py: fix Docker SDK TypeError - containers.run() does not accept a timeout kwarg; switch to detach=True + container.wait() to enforce the timeout, then collect logs and remove the container - containergrader.py: remove brittle hardcoded line numbers (L364, L379, L397, L450) from user-facing error messages - docker-compose.yml: change conf.d and data volumes from :ro to :rw so local edits take effect without rebuild (matches comment intent) - upgrade-python-requirements.yml: add explicit permissions block (contents: write, pull-requests: write) as required by security policy - Automated code Graders With xqueue-watcher.md: remove empty heading, add 'Property' header to comparison table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: replace path-py with stdlib pathlib path-py is an external dependency that wraps pathlib with a fluent API. Since we now require Python >= 3.11, pathlib covers all the same functionality without an extra dependency. Changes: - Replace 'from path import Path' with 'from pathlib import Path' in all source and test files - .dirname() → .parent - .basename() → .name - .absolute() / .absolute() → .resolve() (symlink-safe) - .files('*.json') → .glob('*.json') (with sorted() for stable ordering) - Remove path-py (path-py / path) from pyproject.toml dependencies - Regenerate uv.lock (removes path==17.1.1 and path-py==12.5.0) - Simplify grader.py path-traversal check: now that grader_path is a native pathlib.Path, the inline 'import pathlib' is no longer needed - Fix test_grader.py mock: grader_path.endswith() → grader_path.name == - Fix test_manager.py: pass str() to argparse (Path is not subscriptable) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add edx-codejail as optional dependency; document container isolation decision Add edx-codejail (the upstream PyPI package, v4.1.0) as an optional 'codejail' extra, replacing the previously pinned git-URL reference to a specific commit. uv add --optional codejail edx-codejail codejail is intentionally excluded from the base Docker image because ContainerGrader uses container-level isolation (Linux namespaces, cgroups, capability dropping, network isolation, read-only filesystem) which provides equivalent sandboxing to AppArmor without requiring host-level AppArmor configuration that is unavailable inside Kubernetes pods. Install the 'codejail' extra only when using the legacy JailedGrader on a bare-metal or VM host with AppArmor configured. To use: uv sync --extra codejail Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address second round of PR review feedback - Makefile: fix tab indentation on all recipe lines (was space-indented) - grader.py: remove unused sys import - jailedgrader.py: replace deprecated load_module() with spec_from_file_location/exec_module - containergrader.py: - remove unused imports (logging, os, tempfile) and _JOB_LABEL constant - add emptyDir volume at /tmp in K8s Job spec (required when read_only_root_filesystem=True) - add clarifying comment that K8s grader scripts are baked into the course image - replace deprecated load_module() with importlib.util spec/exec_module pattern - capture stderr from Docker container on non-zero exit for better diagnostics - grader_support/entrypoint.py: correct misleading comment about /tmp writability - deploy/kubernetes/deployment.yaml: fix command to use xqueue-watcher entry point - deploy/kubernetes/configmap.yaml: add xqueue-watcher-queue-configs ConfigMap so manifests apply cleanly out of the box - docker-compose.yml: mount Docker socket for docker backend to work - conf.d/600.json: use absolute /graders/ path instead of relative ../data path - Dockerfile: use C.UTF-8 locale (available without installing locales package) - pyproject.toml: add edx-codejail to dev group so jailed grader tests run in CI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: move full grading pipeline into container; add ContainerGrader unit tests Architecture change: grader scripts are baked into the course-specific Docker image, so the watcher pod has no need to access grader files locally. The grader_support entrypoint now runs the complete grading pipeline inside the container (load grader, preprocess, run answer + submission, compare, return JSON grade), and ContainerGrader.grade() is simplified to just launch the container and parse its JSON output. Changes: - grader_support/entrypoint.py: complete rewrite; now takes GRADER_FILE SEED (not GRADER_FILE submission.py SEED); runs full grade pipeline in container; reads GRADER_LANGUAGE and HIDE_OUTPUT env vars from ContainerGrader - xqueue_watcher/containergrader.py: - Remove grader-module loading, gettext, answer.py reading, and all test- comparison logic from grade() — the container handles this now - grade() now just calls _run() and parses the returned JSON - _run() accepts grader_config and forwards lang/hide_output as env vars - _build_k8s_job(): args are now [grader_abs, seed] (not 3 args), adds GRADER_LANGUAGE and HIDE_OUTPUT env vars, still mounts emptyDir at /tmp - _run_docker(): same arg change; passes GRADER_LANGUAGE and HIDE_OUTPUT - ReadTimeout from container.wait() caught and re-raised as clear RuntimeError - Remove unused _truncate, _prepend_coding, importlib.util - tests/test_container_grader.py: 36 new unit tests covering: - _parse_cpu_millis - ContainerGrader init / backend validation - _build_k8s_job: args, env vars, resource limits, emptyDir/tmp, security - _run_docker: success, non-zero exit (with stderr), timeout, missing SDK - grade(): skip_grader, successful result, container failure, size warning Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: replace statsd/newrelic with OpenTelemetry; add 12-factor settings - Remove dogstatsd-python dependency; replace statsd instrumentation in grader.py with OpenTelemetry counters and a histogram - Add xqueue_watcher/metrics.py: configure_metrics() wires a MeterProvider with an OTLP HTTP exporter when OTEL_EXPORTER_OTLP_ENDPOINT is set; all four instruments (process_item, grader_payload_error, grading_time, replies) defined at module level against the global proxy meter - Call configure_metrics() from Manager.configure_from_directory() so the real provider is installed before any submissions are processed - Add xqueue_watcher/env_settings.py: get_manager_config_from_env() reads all manager config from XQWATCHER_* environment variables, compatible with 12-factor / Kubernetes deployment patterns - Remove newrelic from the production optional-dependency group and from the edx.org Dockerfile stage; the stage now runs xqueue-watcher directly - Add opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-otlp-proto-http to core dependencies; regenerate uv.lock - Add tests/test_env_settings.py and tests/test_metrics.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: remove planning doc from git tracking Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: remove codecov upload from CI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address PR #14 review feedback - docker-compose.yml: remove unused GRADER_BACKEND env var, fix duplicate volumes key by merging into one list, tag sample-grader with image: grader-base:local so conf.d/600.json reference resolves - Dockerfile: standardise CMD config path to /etc/xqueue-watcher to match docker-compose and Kubernetes manifests - metrics.py: remove OTEL_METRIC_EXPORT_INTERVAL from docstring since it is not wired up in _build_meter_provider() - containergrader.py: add pod template metadata labels so the NetworkPolicy podSelector (app.kubernetes.io/component=xqueue-grader) actually matches grading pods; set automount_service_account_token=False on the grading pod spec to reduce blast radius if the NetworkPolicy is misconfigured; add _parse_memory_bytes() helper and use it for the Docker backend mem_limit so Kubernetes-style strings like '256Mi' are converted to bytes rather than passed raw (which Docker does not accept) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: add venv bin to PATH so xqueue-watcher entrypoint resolves uv installs the console script into the project virtual environment at .venv/bin/xqueue-watcher. Without adding this directory to PATH the CMD cannot be found at container startup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add configure_logging() for 12-factor stdout logging When no logging.json file is present, manager.py now calls configure_logging() from env_settings instead of basicConfig(). configure_logging() sets up a single StreamHandler on stdout with a consistent timestamp/level/module format, honours XQWATCHER_LOG_LEVEL (default INFO), and suppresses noisy requests/urllib3 debug output. This removes the need for a logging.json file in Kubernetes deployments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: symlink xqueue-watcher into /usr/local/bin for reliable resolution Using PATH via ENV is fragile -- container runtimes and security policies can reset or ignore it. Install a symlink at /usr/local/bin/xqueue-watcher (always in the standard system PATH) so the entrypoint resolves regardless of how the container is launched. Also remove the stale NEW_RELIC_LICENSE_KEY env entry from the Kubernetes deployment manifest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add env-based defaults for ContainerGrader configuration Add get_container_grader_defaults() to env_settings, reading five new XQWATCHER_GRADER_* env vars: XQWATCHER_GRADER_BACKEND (default: kubernetes) XQWATCHER_GRADER_NAMESPACE (default: default) XQWATCHER_GRADER_CPU_LIMIT (default: 500m) XQWATCHER_GRADER_MEMORY_LIMIT (default: 256Mi) XQWATCHER_GRADER_TIMEOUT (default: 20) ContainerGrader.__init__ now uses None sentinels for these params so that any value omitted from a conf.d KWARGS block falls back to the env-derived default rather than a hardcoded constant. Values supplied explicitly in conf.d always take precedence, preserving backwards compatibility. Also fixes duplicate function definitions that had crept into env_settings.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(containergrader): add ImageDigestPoller and image pull policy support - Add ImageDigestPoller class: background daemon thread that periodically resolves a tag-based image reference to its current digest via docker.APIClient.inspect_distribution(). Thread-safe; falls back to the original reference if resolution fails. - Add image_pull_policy param to ContainerGrader (auto-detect: IfNotPresent for digest refs, Always for tag-based refs; can be overridden explicitly). - Add poll_image_digest and digest_poll_interval params to activate the poller. When enabled, Kubernetes Jobs use the most recently resolved repo@sha256:… reference via _effective_image(), ensuring nodes always run the latest pushed image without relying on imagePullPolicy: Always for every pod. - Add .github/workflows/publish-grader-base-image.yml to build and push grader_support/Dockerfile.base to ghcr.io/mitodl/xqueue-watcher-grader-base on push to master (grader_support/** paths), weekly schedule, and workflow_dispatch. Multi-platform linux/amd64,linux/arm64. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: normalise imagePullPolicy to title-case before K8s API call Kubernetes requires imagePullPolicy to be exactly 'Always', 'IfNotPresent', or 'Never' (case-sensitive). When the value is supplied via KWARGS in the conf.d JSON (e.g. 'always' or 'ALWAYS'), the K8s API returns 422 Unprocessable Entity. Add a normalisation dict lookup that maps the lowercased input back to the canonical title-case form. Unknown values are passed through unchanged so Kubernetes can surface the validation error with a clear message. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add strip_path_components to ContainerGrader for legacy path prefixes LMS grader_payload 'grader' fields configured against the old git-clone deployment include a queue-name prefix, e.g.: mit-600x-Watcher-MITX-6.0001r/graders/python3graders/chips1/.../grade.py In the containerized approach, graders are baked directly into the image at grader_root, so the path resolves to: /graders/mit-600x-Watcher-MITX-6.0001r/graders/python3graders/... which doesn't exist. The actual path in the image is: /graders/python3graders/... Add strip_path_components (int, default 0) KWARG to ContainerGrader. When > 0, that many leading path components are stripped from the grader path (relative to grader_root) before it is passed as the container entrypoint argument. Set to 2 to remove both the queue-name component and the redundant repo subdirectory name. Example KWARGS: "strip_path_components": 2 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: install gettext into builtins before loading grader module Grader scripts may call _() at module level (e.g. in input_validators defined at import time). The previous code installed trans.install() after exec_module, causing NameError: name '_' is not defined. Move the entire locale/gettext setup block to before exec_module so _ is available in builtins when the grader script is first executed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: normalize mixed tab/space indentation before exec Python 3 raises TabError when exec'ing code with mixed tabs and spaces in the same indented block. Many course grader answer.py files were authored for Python 2 which tolerated this. Call expandtabs(4) on both the staff answer and student submission before preprocessing and writing to /tmp, so exec never sees raw tabs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: sys.path ordering so preprocessed answer/submission shadow originals run.py's import_captured() uses __import__() to load answer and submission modules. grader_dir was inserted into sys.path AFTER /tmp, making it position 0, so __import__('answer') found the original /graders/.../answer.py (with bare 'for c in s:') instead of the preprocessed /tmp/answer.py (with 'submission_code = repr(...)'). Fix: insert grader_dir first, then /tmp, so /tmp is position 0 and the preprocessed files always shadow the originals. Also: - Add _dbg() helper for debug tracing behind GRADER_DEBUG=1 env var; off by default so stderr output doesn't corrupt the JSON pod log that containergrader.py reads via read_namespaced_pod_log. - Import traceback (used by _dbg exception paths). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: log raw container output bytes on JSON parse failure Add an explicit ERROR-level log of the raw bytes (repr, up to 4096) when json.loads fails so we can see exactly what the pod log contains, including any leading/trailing garbage from stderr that Kubernetes combines into the pod log stream. Also add a DEBUG-level log of every container output for tracing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: push grader base image to DockerHub as well as GHCR Concourse grader-image pipelines use DockerHub as the trigger source. The workflow previously only pushed to GHCR, so Concourse never saw updates to the base image. Changes: - Add DockerHub login step (DOCKERHUB_USERNAME/DOCKERHUB_PASSWORD secrets) - Push to both mitodl/xqueue-watcher-grader-base (DockerHub) and ghcr.io/mitodl/xqueue-watcher-grader-base (GHCR) - Tag :latest on feature branches during active development so Concourse picks up fixes without waiting for master merge - Add feature branches to push trigger so grader_support fixes are published immediately Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: request stdout-only stream from read_namespaced_pod_log The default stream parameter is 'All', which interleaves stderr into the returned string. Any stderr output from the container (Python warnings, import messages, etc.) corrupts the JSON that the entrypoint prints to stdout, causing JSONDecodeError in the watcher. Pass stream='Stdout' and container='grader' explicitly so only the container's stdout is returned. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: extract last line from pod log instead of using stream param The PodLogsQuery feature gate (which enables the 'stream' field in PodLogOptions) is opt-in and disabled on the target cluster. Using stream= returns a 422 FieldValueForbidden error even on K8s 1.35. Instead, fetch the combined stdout+stderr log and scan backwards for the last non-empty line. The entrypoint always prints exactly one JSON object as its final output line, so this reliably extracts the result regardless of any stderr noise interleaved earlier in the log. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: bypass kubernetes client JSON deserialisation of pod logs read_namespaced_pod_log returns response_type='str'. The client's deserialize() method first calls json.loads() on the raw response body (succeeds since the entrypoint outputs valid JSON), then passes the resulting Python dict to __deserialize_primitive(dict, str) which calls str(dict) — producing Python repr with single-quoted keys and True/False booleans, which is not valid JSON. Fix: pass _preload_content=False to get the raw urllib3 response object and read .data directly as bytes, bypassing the client deserialisation entirely. The raw bytes are valid UTF-8 JSON as printed by the entrypoint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: add top-level permissions: {} to restrict default GITHUB_TOKEN scope Addresses GitHub Advanced Security finding: 'Workflow does not contain permissions'. Adding a workflow-level permissions: {} block ensures the GITHUB_TOKEN has no default permissions; each job must explicitly declare what it needs. The update-dependencies job retains its required contents: write and pull-requests: write grants. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: remove strip_path_components from ContainerGrader strip_path_components was added to work around what turned out to be a configuration error in the LMS grader_payload, not a structural problem in the grading path resolution. Remove the parameter, its __init__ assignment, the stripping logic in _build_k8s_job, and all docstring references to keep the code simple and correct. Also addressed in this commit: - grader.py: downgrade per-submission grading-time log from INFO to DEBUG to avoid high-volume noise in production log streams - Dockerfile: pin uv to 0.10.7 via a named build stage instead of floating ghcr.io/astral-sh/uv:latest; replace the xqueue-watcher symlink with ENV PATH so the full venv is on PATH - env_settings.py: add XQWATCHER_DOCKER_HOST_GRADER_ROOT env var (preparation for docker_host_grader_root ContainerGrader param) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: add docker_host_grader_root; drop path-py from grader base image ContainerGrader (Docker backend): add docker_host_grader_root parameter so that when xqueue-watcher runs inside a container the bind-mount source path can be translated from the watcher-container path to the equivalent host-side path. Without this the Docker daemon (reached via the mounted socket) would look for the grader directory on the host where it does not exist. Defaults to XQWATCHER_DOCKER_HOST_GRADER_ROOT env var or None (watcher runs directly on the host, no translation needed). docker-compose.yml: add XQWATCHER_DOCKER_HOST_GRADER_ROOT placeholder and explanatory comment so operators know to set the absolute host path. grader_support/Dockerfile.base: remove the path-py pip install. The grader_support framework itself does not import path; course teams that need path-py can add it in their own downstream image. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add named xqueue server references via xqueue_servers.json Queue configs in conf.d can now use SERVER_REF to reference a named server defined in xqueue_servers.json, avoiding the need to embed XQueue URLs and credentials directly in grader configuration files. - settings.py: add get_xqueue_servers() to load and validate xqueue_servers.json from the config root - manager.py: load xqueue_servers.json in configure_from_directory(); resolve SERVER_REF in client_from_config(), raising ValueError for unknown names or conflicts with SERVER/AUTH - env_settings.py: document the Kubernetes Secret volume-mount pattern for xqueue_servers.json as the preferred credentials delivery mechanism - conf.d/600.json: update example to use SERVER_REF - tests: add ServerRefTests and TestGetXqueueServers covering resolution, error cases, and configure_from_directory integration - tests/fixtures/config/xqueue_servers.json: fixture server for tests - README.md: document SERVER_REF, xqueue_servers.json format, and Kubernetes Secret mounting pattern Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: remove DockerHub push from grader base image workflow Only push to GHCR. Remove the DockerHub login step, DockerHub image reference from the metadata action, and the DOCKERHUB_USERNAME / DOCKERHUB_PASSWORD secret dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: harden containergrader and XQueue client - Fix TLS certificate verification: replace hardcoded verify=False with a _VERIFY_TLS flag (default True). Operators can opt out via XQWATCHER_VERIFY_TLS=false for dev environments; a warning is logged when verification is disabled. - Remove credentials from logs: strip self.password from the debug login message and the login-retry error message in client.py. - Enforce hard submission size limit: reject submissions larger than XQWATCHER_SUBMISSION_SIZE_LIMIT bytes (default 1 MB) before launching a container. Prevents etcd object-size overflows and resource-exhaustion attacks via very large env vars. Keep the existing 32 KB warning for submissions that are large but within the limit. - Add seccomp RuntimeDefault profile to Kubernetes grading Jobs: applied at both the pod level (V1PodSecurityContext) and the container level (V1SecurityContext) to restrict the available syscall surface. - Add PID limit to grading container resource limits: caps the number of processes a grading container may create at 256, preventing fork-bomb attacks from affecting other node workloads. - Cap /tmp emptyDir at 50 Mi: adds size_limit='50Mi' to the emptyDir volume backing /tmp in grading pods, preventing disk-exhaustion attacks. - Add path traversal pre-check in grader.py: explicitly reject grader paths containing '..' components before Path.resolve() is called, removing symlink edge-cases that could bypass the relative_to() guard. - Update containergrader module docstring and env_settings docs to accurately describe the security posture and new env vars. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: address PR #14 review feedback - Makefile: add missing tab indentation on help target recipe lines - grader_support/entrypoint.py: fix always-true EndTest check (use str(e).strip() not e is not None) - tests/test_env_settings.py: use clear=True in hermetic default-value tests - tests/test_metrics.py: use clear=True to prevent OTEL_ env vars bleeding in - xqueue_watcher/client.py: apply _VERIFY_TLS in _request() and _login(), not just put_result - xqueue_watcher/containergrader.py: - fix image repo parsing to handle registry:port/image:tag refs (rfind approach) - fix 'pods' → 'pids' container resource limit - lazy-init Kubernetes API clients once per instance (avoids per-submission config load) - xqueue_watcher/env_settings.py: parse HTTP_BASIC_AUTH into (username, password) tuple - xqueue_watcher/metrics.py: clarify OTEL_RESOURCE_ATTRIBUTES is parsed by SDK automatically Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 857a9ea commit 5be7bb1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+3658
-463
lines changed

.github/workflows/ci.yml

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,17 @@ jobs:
1616
matrix:
1717
os:
1818
- ubuntu-latest
19-
python-version: ['3.12']
19+
python-version: ['3.12', '3.13']
2020
steps:
2121
- uses: actions/checkout@v4
22-
- name: setup python
23-
uses: actions/setup-python@v5
22+
23+
- name: Install uv
24+
uses: astral-sh/setup-uv@v4
2425
with:
25-
python-version: ${{ matrix.python-version }}
26+
enable-cache: true
2627

27-
- name: Install requirements and Run Tests
28-
run: make test
28+
- name: Set up Python ${{ matrix.python-version }}
29+
run: uv python install ${{ matrix.python-version }}
2930

30-
- name: Run Coverage
31-
uses: codecov/codecov-action@v4
32-
with:
33-
token: ${{ secrets.CODECOV_TOKEN }}
34-
fail_ci_if_error: true
31+
- name: Run Tests
32+
run: uv run --python ${{ matrix.python-version }} pytest tests
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
name: Publish grader base image
2+
3+
# Builds grader_support/Dockerfile.base and pushes to:
4+
# - GHCR: ghcr.io/mitodl/xqueue-watcher-grader-base
5+
6+
on:
7+
push:
8+
branches:
9+
- master
10+
- feat/xqwatcher-kubernetes-migration
11+
- chore/migrate-to-uv-and-k8s-container-grader
12+
paths:
13+
- "grader_support/**"
14+
schedule:
15+
# Weekly rebuild to pick up base Python/OS security patches (Sunday 00:00 UTC)
16+
- cron: "0 0 * * 0"
17+
workflow_dispatch:
18+
19+
env:
20+
IMAGE_NAME: mitodl/xqueue-watcher-grader-base
21+
22+
jobs:
23+
build-and-push:
24+
name: Build and push grader base image
25+
runs-on: ubuntu-latest
26+
permissions:
27+
contents: read
28+
packages: write
29+
30+
steps:
31+
- name: Checkout repository
32+
uses: actions/checkout@v4
33+
34+
- name: Log in to GHCR
35+
uses: docker/login-action@v3
36+
with:
37+
registry: ghcr.io
38+
username: ${{ github.actor }}
39+
password: ${{ secrets.GITHUB_TOKEN }}
40+
41+
- name: Set up QEMU (for multi-platform builds)
42+
uses: docker/setup-qemu-action@v3
43+
44+
- name: Set up Docker Buildx
45+
uses: docker/setup-buildx-action@v3
46+
47+
- name: Extract image metadata
48+
id: meta
49+
uses: docker/metadata-action@v5
50+
with:
51+
images: |
52+
ghcr.io/${{ env.IMAGE_NAME }}
53+
tags: |
54+
type=raw,value=latest,enable={{is_default_branch}}
55+
type=raw,value=latest,enable=${{ github.ref_name == 'chore/migrate-to-uv-and-k8s-container-grader' || github.ref_name == 'feat/xqwatcher-kubernetes-migration' }}
56+
type=sha,format=short
57+
58+
- name: Build and push
59+
uses: docker/build-push-action@v6
60+
with:
61+
context: .
62+
file: grader_support/Dockerfile.base
63+
platforms: linux/amd64,linux/arm64
64+
push: true
65+
tags: ${{ steps.meta.outputs.tags }}
66+
labels: ${{ steps.meta.outputs.labels }}
67+
cache-from: type=gha
68+
cache-to: type=gha,mode=max
Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,33 @@
1-
name: Upgrade Python Requirements
1+
name: Update Dependencies
22

33
on:
44
schedule:
55
- cron: "15 15 1/14 * *"
66
workflow_dispatch:
7-
inputs:
8-
branch:
9-
description: "Target branch against which to create requirements PR"
10-
required: true
11-
default: 'master'
7+
8+
permissions: {}
129

1310
jobs:
14-
call-upgrade-python-requirements-workflow:
15-
uses: openedx/.github/.github/workflows/upgrade-python-requirements.yml@master
16-
with:
17-
branch: ${{ github.event.inputs.branch || 'master' }}
18-
# optional parameters below; fill in if you'd like github or email notifications
19-
# user_reviewers: ""
20-
# team_reviewers: ""
21-
email_address: "aurora-requirements-update@2u-internal.opsgenie.net"
22-
send_success_notification: true
23-
secrets:
24-
requirements_bot_github_token: ${{ secrets.REQUIREMENTS_BOT_GITHUB_TOKEN }}
25-
requirements_bot_github_email: ${{ secrets.REQUIREMENTS_BOT_GITHUB_EMAIL }}
26-
edx_smtp_username: ${{ secrets.EDX_SMTP_USERNAME }}
27-
edx_smtp_password: ${{ secrets.EDX_SMTP_PASSWORD }}
11+
update-dependencies:
12+
runs-on: ubuntu-24.04
13+
permissions:
14+
contents: write
15+
pull-requests: write
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Install uv
20+
uses: astral-sh/setup-uv@v4
21+
22+
- name: Update uv.lock
23+
run: uv lock --upgrade
24+
25+
- name: Create Pull Request
26+
uses: peter-evans/create-pull-request@v6
27+
with:
28+
token: ${{ secrets.REQUIREMENTS_BOT_GITHUB_TOKEN }}
29+
commit-message: "chore: update uv.lock with latest dependency versions"
30+
title: "chore: update dependencies"
31+
body: "Automated dependency update via `uv lock --upgrade`."
32+
branch: "chore/update-dependencies"
33+
delete-branch: true

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,10 @@ reports/
2222
\#*\#
2323
*.egg-info
2424
.idea/
25+
26+
# uv
27+
.venv/
28+
29+
# Kubernetes secrets — never commit real values
30+
deploy/kubernetes/secret.yaml
31+
Automated code Graders With xqueue-watcher.md

Dockerfile

Lines changed: 37 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,46 @@
1-
FROM ubuntu:xenial as openedx
1+
ARG UV_VERSION=0.10.7
2+
FROM ghcr.io/astral-sh/uv:${UV_VERSION} AS uv
23

3-
RUN apt update && \
4-
apt install -y git-core language-pack-en apparmor apparmor-utils python python-pip python-dev && \
5-
pip install --upgrade pip setuptools && \
6-
rm -rf /var/lib/apt/lists/*
4+
FROM python:3.11-slim AS base
75

8-
RUN locale-gen en_US.UTF-8
9-
ENV LANG en_US.UTF-8
10-
ENV LANGUAGE en_US:en
11-
ENV LC_ALL en_US.UTF-8
6+
ENV PYTHONDONTWRITEBYTECODE=1 \
7+
PYTHONUNBUFFERED=1 \
8+
LANG=C.UTF-8 \
9+
LC_ALL=C.UTF-8
10+
11+
RUN apt-get update && \
12+
apt-get install -y --no-install-recommends git-core && \
13+
rm -rf /var/lib/apt/lists/*
14+
15+
RUN useradd -m --shell /bin/false app
16+
17+
COPY --from=uv /uv /usr/local/bin/uv
1218

1319
WORKDIR /edx/app/xqueue_watcher
14-
COPY requirements /edx/app/xqueue_watcher/requirements
15-
RUN pip install -r requirements/production.txt
1620

17-
CMD python -m xqueue_watcher -d /edx/etc/xqueue_watcher
21+
COPY pyproject.toml uv.lock ./
22+
RUN uv sync --frozen --no-dev --no-install-project
23+
24+
COPY . /edx/app/xqueue_watcher
25+
RUN uv sync --frozen --no-dev
26+
# Note: the `codejail` optional extra (edx-codejail) is intentionally omitted
27+
# from this image. In the Kubernetes deployment, student code runs inside an
28+
# isolated container (ContainerGrader) — the container boundary provides the
29+
# sandbox via Linux namespaces, cgroups, capability dropping, network isolation,
30+
# and a read-only filesystem. codejail (AppArmor + OS-level user-switching)
31+
# requires host-level AppArmor configuration that is unavailable inside
32+
# Kubernetes pods and adds no meaningful security benefit on top of container
33+
# isolation. Install the `codejail` extra only when running the legacy
34+
# JailedGrader on a bare-metal or VM host with AppArmor configured.
35+
36+
# Put the venv on PATH so `xqueue-watcher` and any other installed scripts are
37+
# available without a symlink.
38+
ENV PATH="/edx/app/xqueue_watcher/.venv/bin:$PATH"
1839

19-
RUN useradd -m --shell /bin/false app
2040
USER app
2141

22-
COPY . /edx/app/xqueue_watcher
42+
CMD ["xqueue-watcher", "-d", "/etc/xqueue-watcher"]
2343

24-
FROM openedx as edx.org
25-
RUN pip install newrelic
26-
CMD newrelic-admin run-program python -m xqueue_watcher -d /edx/etc/xqueue_watcher
44+
FROM base AS edx.org
45+
USER app
46+
CMD ["xqueue-watcher", "-d", "/etc/xqueue-watcher"]

Makefile

Lines changed: 19 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,29 @@
1-
NODE_BIN=./node_modules/.bin
2-
31
help:
4-
@echo ' '
5-
@echo 'Makefile for the xqueue-watcher '
6-
@echo ' '
7-
@echo 'Usage: '
8-
@echo ' make requirements install requirements for local development '
9-
@echo ' make test run python unit-tests '
10-
@echo ' make clean delete generated byte code and coverage reports '
11-
@echo ' '
12-
13-
COMMON_CONSTRAINTS_TXT=requirements/common_constraints.txt
14-
.PHONY: $(COMMON_CONSTRAINTS_TXT)
15-
$(COMMON_CONSTRAINTS_TXT):
16-
wget -O "$(@)" https://raw.githubusercontent.com/edx/edx-lint/master/edx_lint/files/common_constraints.txt || touch "$(@)"
17-
18-
upgrade: export CUSTOM_COMPILE_COMMAND=make upgrade
19-
upgrade: $(COMMON_CONSTRAINTS_TXT)
20-
## update the requirements/*.txt files with the latest packages satisfying requirements/*.in
21-
pip install -q -r requirements/pip_tools.txt
22-
pip-compile --allow-unsafe --rebuild --upgrade -o requirements/pip.txt requirements/pip.in
23-
pip-compile --upgrade -o requirements/pip_tools.txt requirements/pip_tools.in
24-
pip install -q -r requirements/pip.txt
25-
pip install -q -r requirements/pip_tools.txt
26-
pip-compile --upgrade -o requirements/base.txt requirements/base.in
27-
pip-compile --upgrade -o requirements/production.txt requirements/production.in
28-
pip-compile --upgrade -o requirements/test.txt requirements/test.in
29-
pip-compile --upgrade -o requirements/ci.txt requirements/ci.in
2+
@echo ''
3+
@echo 'Makefile for the xqueue-watcher'
4+
@echo ''
5+
@echo 'Usage:'
6+
@echo ' make requirements sync dev dependencies with uv'
7+
@echo ' make test run python unit-tests'
8+
@echo ' make docker-build build the grader base Docker image'
9+
@echo ' make local-run run locally with docker-compose'
10+
@echo ' make clean delete generated byte code'
11+
@echo ''
3012

3113
requirements:
32-
pip install -qr requirements/production.txt --exists-action w
14+
uv sync
3315

34-
test.requirements:
35-
pip install -q -r requirements/test.txt --exists-action w
16+
test: requirements
17+
uv run pytest --cov=xqueue_watcher --cov-report=xml tests
3618

37-
ci.requirements:
38-
pip install -q -r requirements/ci.txt --exists-action w
19+
docker-build:
20+
docker build -t xqueue-watcher:local .
21+
docker build -t grader-base:local -f grader_support/Dockerfile.base .
3922

40-
test: test.requirements
41-
pytest --cov=xqueue_watcher --cov-report=xml tests
23+
local-run:
24+
docker compose up
4225

4326
clean:
4427
find . -name '*.pyc' -delete
4528

46-
# Targets in a Makefile which do not produce an output file with the same name as the target name
47-
.PHONY: help requirements clean
29+
.PHONY: help requirements test docker-build local-run clean

0 commit comments

Comments
 (0)