feat(api): add periodic cleanup of stale Attack Paths scans with dead-worker detection#10387
Conversation
…-worker detection
…R-1207-improve-orphan-temporal-scan-databases-deletion-celery-tasks
…R-1207-improve-orphan-temporal-scan-databases-deletion-celery-tasks
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
|
✅ All necessary |
🔒 Container Security ScanImage: 📊 Vulnerability Summary
4 package(s) affected
|
There was a problem hiding this comment.
Pull request overview
Adds an automated mechanism to detect and clean up orphaned/stale Attack Paths scans (e.g., after worker death), plus improves Celery worker naming to make dead-worker detection reliable.
Changes:
- Introduces
cleanup_stale_attack_paths_scans()job logic and wires it into a new Celery task + periodic beat entry. - Adds configuration for the staleness threshold (default 48h) and expands tests for cleanup behavior.
- Updates the API container entrypoint to generate unique Celery worker names (ECS task ID or UUID + hostname) and documents the feature in the API changelog.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| api/src/backend/tasks/jobs/attack_paths/cleanup.py | Implements stale scan detection, worker liveness checks, task revocation, and scan cleanup flow. |
| api/src/backend/tasks/tasks.py | Adds a Celery task wrapper to run the cleanup job. |
| api/src/backend/api/migrations/0085_attack_paths_cleanup_periodic_task.py | Creates a django-celery-beat periodic task to run cleanup daily. |
| api/src/backend/config/django/base.py | Adds ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES setting (default 48h). |
| api/src/backend/tasks/tests/test_attack_paths_scan.py | Adds unit tests covering dead worker, threshold expiry, cross-tenant cleanup, and failure handling. |
| api/docker-entrypoint.sh | Generates unique Celery worker names to improve dead-worker detection reliability. |
| api/CHANGELOG.md | Adds an UNRELEASED entry describing the periodic cleanup behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
api/src/backend/api/migrations/0085_attack_paths_cleanup_periodic_task.py
Outdated
Show resolved
Hide resolved
api/src/backend/api/migrations/0085_attack_paths_cleanup_periodic_task.py
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10387 +/- ##
===========================================
+ Coverage 56.86% 93.36% +36.49%
===========================================
Files 87 220 +133
Lines 2847 30637 +27790
===========================================
+ Hits 1619 28603 +26984
- Misses 1228 2034 +806
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…or handling and idempotent migration
…or handling and idempotent migration - Passing ruff
Context
Worker death (
SIGKILL,OOM, container crash) leaves temp Neo4j databases andAttackPathsScanrows stuck inEXECUTINGforever. No existing mechanism catches these orphans.Description
Adds a periodic Celery task that detects and cleans up stale attack paths scans using a two-pass approach:
inspect().ping()returnsNone): mark scanFAILEDimmediately, any age.SIGTERM, then markFAILED.After marking as
FAILED, the cleanup recoversgraph_data_readyif the tenant database still has provider data, so query endpoints aren't blocked untilthe next successful scan.
Also adds
resolve_worker_hostname()todocker-entrypoint.sh, generating unique Celery worker names (ECS task ID or UUID + hostname). Without this,multiple workers on the same EC2 instance share a hostname, making
inspect().ping()unreliable for dead-worker detection.Steps to review
Check the cleanup functions with its migration. Also check the entrypoint script for the new worker name.
Checklist
API
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.