Overview
Add a background job to link orphaned spans that are missing test_result_id as a safety net for edge cases where force flush fails.
Problem
While force flush solves the primary race condition, edge cases can still occur:
- Network failures during span export
- SDK crashes before flush completes
- Timeout scenarios
- Backend downtime during export
Proposed Solution
Implement a periodic Celery task that:
- Finds spans with test context but null test_result_id (created in last hour)
- Groups them by (test_run_id, test_id, test_configuration_id, org_id)
- Looks up matching test results from database
- Links the orphaned spans using update_traces_with_test_result_id()
Implementation Outline
File: apps/backend/src/rhesis/backend/tasks/telemetry/retroactive_linking.py
@celery_app.task
def link_orphaned_traces():
# Find spans: test_id IS NOT NULL AND test_result_id IS NULL
# Created within last hour
# Group by execution context
# For each group, find test_result and update spans
pass
Schedule to run every 1-5 minutes via Celery beat.
Benefits
- Eventual consistency guarantee
- Handles all edge cases
- Zero impact on test execution latency
- Complements force flush approach
Priority
Low - Force flush handles 99.9% of cases. This is for remaining edge cases.
Overview
Add a background job to link orphaned spans that are missing test_result_id as a safety net for edge cases where force flush fails.
Problem
While force flush solves the primary race condition, edge cases can still occur:
Proposed Solution
Implement a periodic Celery task that:
Implementation Outline
File: apps/backend/src/rhesis/backend/tasks/telemetry/retroactive_linking.py
Schedule to run every 1-5 minutes via Celery beat.
Benefits
Priority
Low - Force flush handles 99.9% of cases. This is for remaining edge cases.