Skip to content

feat(api): add periodic cleanup of stale Attack Paths scans with dead-worker detection#10387

Open
josema-xyz wants to merge 7 commits intomasterfrom
PROWLER-1207-improve-orphan-temporal-scan-databases-deletion-celery-tasks
Open

feat(api): add periodic cleanup of stale Attack Paths scans with dead-worker detection#10387
josema-xyz wants to merge 7 commits intomasterfrom
PROWLER-1207-improve-orphan-temporal-scan-databases-deletion-celery-tasks

Conversation

@josema-xyz
Copy link
Contributor

Context

Worker death (SIGKILL, OOM, container crash) leaves temp Neo4j databases and AttackPathsScan rows stuck in EXECUTING forever. No existing mechanism catches these orphans.

Description

Adds a periodic Celery task that detects and cleans up stale attack paths scans using a two-pass approach:

  1. Dead worker (inspect().ping() returns None): mark scan FAILED immediately, any age.
  2. Live worker + past threshold (default 48h): revoke the task with SIGTERM, then mark FAILED.
  3. Live worker + within threshold: skip, scan is still running normally.

After marking as FAILED, the cleanup recovers graph_data_ready if the tenant database still has provider data, so query endpoints aren't blocked until
the next successful scan.

Also adds resolve_worker_hostname() to docker-entrypoint.sh, generating unique Celery worker names (ECS task ID or UUID + hostname). Without this,
multiple workers on the same EC2 instance share a hostname, making inspect().ping() unreliable for dead-worker detection.

Steps to review

Check the cleanup functions with its migration. Also check the entrypoint script for the new worker name.

Checklist

API

  • All issue/task requirements work as expected on the API
  • Endpoint response output (if applicable)
  • EXPLAIN ANALYZE output for new/modified queries or indexes (if applicable)
  • Performance test results (if applicable)
  • Any other relevant evidence of the implementation (if applicable)
  • Verify if API specs need to be regenerated.
  • Check if version updates are required (e.g., specs, Poetry, etc.).
  • Ensure new entries are added to CHANGELOG.md, if applicable.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@josema-xyz josema-xyz requested a review from a team as a code owner March 19, 2026 11:20
Copilot AI review requested due to automatic review settings March 19, 2026 11:20
@github-actions github-actions bot added component/api review-django-migrations This PR contains changes in Django migrations labels Mar 19, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

✅ All necessary CHANGELOG.md files have been updated.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

🔒 Container Security Scan

Image: prowler-api:f39c650
Last scan: 2026-03-19 12:07:20 UTC

📊 Vulnerability Summary

Severity Count
🔴 Critical 5
Total 5

4 package(s) affected

⚠️ Action Required

Critical severity vulnerabilities detected. These should be addressed before merging:

  • Review the detailed scan results
  • Update affected packages to patched versions
  • Consider using a different base image if updates are unavailable

📋 Resources:

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automated mechanism to detect and clean up orphaned/stale Attack Paths scans (e.g., after worker death), plus improves Celery worker naming to make dead-worker detection reliable.

Changes:

  • Introduces cleanup_stale_attack_paths_scans() job logic and wires it into a new Celery task + periodic beat entry.
  • Adds configuration for the staleness threshold (default 48h) and expands tests for cleanup behavior.
  • Updates the API container entrypoint to generate unique Celery worker names (ECS task ID or UUID + hostname) and documents the feature in the API changelog.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
api/src/backend/tasks/jobs/attack_paths/cleanup.py Implements stale scan detection, worker liveness checks, task revocation, and scan cleanup flow.
api/src/backend/tasks/tasks.py Adds a Celery task wrapper to run the cleanup job.
api/src/backend/api/migrations/0085_attack_paths_cleanup_periodic_task.py Creates a django-celery-beat periodic task to run cleanup daily.
api/src/backend/config/django/base.py Adds ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES setting (default 48h).
api/src/backend/tasks/tests/test_attack_paths_scan.py Adds unit tests covering dead worker, threshold expiry, cross-tenant cleanup, and failure handling.
api/docker-entrypoint.sh Generates unique Celery worker names to improve dead-worker detection reliability.
api/CHANGELOG.md Adds an UNRELEASED entry describing the periodic cleanup behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 91.09312% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.36%. Comparing base (0f2fdcf) to head (6210fa1).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #10387       +/-   ##
===========================================
+ Coverage   56.86%   93.36%   +36.49%     
===========================================
  Files          87      220      +133     
  Lines        2847    30637    +27790     
===========================================
+ Hits         1619    28603    +26984     
- Misses       1228     2034      +806     
Flag Coverage Δ
api 93.36% <91.09%> (?)
prowler-py3.10-oraclecloud ?
prowler-py3.11-oraclecloud ?
prowler-py3.12-oraclecloud ?
prowler-py3.9-oraclecloud ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
prowler ∅ <ø> (∅)
api 93.36% <91.09%> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/api review-django-migrations This PR contains changes in Django migrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants