Skip to content

feat: Add retroactive trace linking for orphaned spans #1084

@harry-rhesis

Description

@harry-rhesis

Overview

Add a background job to link orphaned spans that are missing test_result_id as a safety net for edge cases where force flush fails.

Problem

While force flush solves the primary race condition, edge cases can still occur:

  • Network failures during span export
  • SDK crashes before flush completes
  • Timeout scenarios
  • Backend downtime during export

Proposed Solution

Implement a periodic Celery task that:

  1. Finds spans with test context but null test_result_id (created in last hour)
  2. Groups them by (test_run_id, test_id, test_configuration_id, org_id)
  3. Looks up matching test results from database
  4. Links the orphaned spans using update_traces_with_test_result_id()

Implementation Outline

File: apps/backend/src/rhesis/backend/tasks/telemetry/retroactive_linking.py

@celery_app.task
def link_orphaned_traces():
    # Find spans: test_id IS NOT NULL AND test_result_id IS NULL
    # Created within last hour
    # Group by execution context
    # For each group, find test_result and update spans
    pass

Schedule to run every 1-5 minutes via Celery beat.

Benefits

  • Eventual consistency guarantee
  • Handles all edge cases
  • Zero impact on test execution latency
  • Complements force flush approach

Priority

Low - Force flush handles 99.9% of cases. This is for remaining edge cases.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions