Skip to content

Investigate and Implement Telemetry Overhead Optimizations #1081

@harry-rhesis

Description

@harry-rhesis

Summary

Investigate and implement optimizations to minimize OpenTelemetry telemetry overhead for client applications using the Rhesis SDK.

Background / Context

Following WP3 SDK Integration implementation, we have a working OpenTelemetry integration with asynchronous batching. However, we've identified several potential optimization points that could reduce overhead for high-volume production applications.


Current Architecture Performance Profile

✅ Already Optimized

The current implementation includes several excellent performance optimizations:

  1. Asynchronous Batching via BatchSpanProcessor

    • Buffer: 2048 spans
    • Batch size: 512 spans per HTTP request
    • Export interval: Every 5 seconds
    • Impact: Application continues immediately; spans export in background
    • Overhead per span: ~1-2µs (just memory write)
  2. Non-Blocking Span Creation

    • Context manager for span lifecycle
    • Data stored in memory, not sent immediately
  3. Singleton TracerProvider

    • Single provider reused across all requests
    • No repeated initialization overhead

⚠️ Identified Overhead Points

1. Synchronous Result Serialization

Location: sdk/src/rhesis/sdk/telemetry/tracer.py:113

result_str = str(result)[:1000]  # Runs synchronously
span.set_attribute("function.result_preview", result_str)

Problem: str(result) can be expensive for large objects (Pydantic models, lists)

Estimated Overhead: 10-50µs for simple objects, 100µs+ for complex objects

Proposed Solutions:

Option A: Make it optional via environment variable

if os.getenv("RHESIS_TRACE_RESULT_PREVIEW", "false") == "true":
    result_str = str(result)[:1000]
    span.set_attribute("function.result_preview", result_str)

Option B: Use type name instead

span.set_attribute("function.result_type", type(result).__name__)
span.set_attribute("function.result_size", len(str(result)[:100]))  # Cheaper estimate

Recommendation: Option A for flexibility


2. Pydantic Validation During Export

Location: sdk/src/rhesis/sdk/telemetry/exporter.py:179-195

otel_span = OTELSpan(  # Pydantic validation on every span
    trace_id=trace_id,
    span_id=span_id,
    # ... validation happens here
)

Problem: Pydantic validation adds overhead during span export (background thread)

Estimated Overhead: ~100-500µs per span during export

Proposed Solutions:

Option A: Skip validation in production

# Use model_construct to skip validation
otel_span = OTELSpan.model_construct(
    trace_id=trace_id,
    span_id=span_id,
    # ... no validation overhead
)

Option B: Configurable validation

if os.getenv("RHESIS_STRICT_VALIDATION", "true") == "true":
    otel_span = OTELSpan(**data)  # Validate
else:
    otel_span = OTELSpan.model_construct(**data)  # Skip validation

Recommendation: Option B, default to validation in development

Note: Validation happens in background thread, so less critical than #1


3. Dictionary Conversions

Location: sdk/src/rhesis/sdk/telemetry/exporter.py:191-194

attributes=dict(span.attributes) if span.attributes else {},
# ... repeated for events, links, resource

Problem: Creating new dictionaries adds memory allocations

Estimated Overhead: ~5-10µs per span

Proposed Solution:

# Avoid unnecessary conversion if already dict
attributes=span.attributes if isinstance(span.attributes, dict) else dict(span.attributes)

Recommendation: Low priority - investigate if profiling shows significant impact


Recommended Optimizations (Priority Order)

Priority 1: Sampling Support

Add ability to sample traces to reduce volume in high-traffic scenarios.

Implementation: Add sampling configuration to TracerProvider

# In tracer.py
def trace_execution(self, function_name, func, args, kwargs, span_name=None):
    sampling_rate = float(os.getenv("RHESIS_SAMPLING_RATE", "1.0"))
    if random.random() > sampling_rate:
        return func(*args, **kwargs)  # Skip tracing
    
    # Continue with normal tracing...

Use Case: High-volume production services (>1000 req/sec)

Expected Impact: Linear reduction in overhead (10% sampling = 90% overhead reduction)


Priority 2: Optional Result Serialization

Make result preview optional (see #1 above).

Use Case: Functions returning large objects

Expected Impact: 10-100µs savings per trace

Configuration:

# Development - full observability
RHESIS_TRACE_RESULT_PREVIEW=true

# Production - minimal overhead
RHESIS_TRACE_RESULT_PREVIEW=false  # Default

Priority 3: Tunable BatchSpanProcessor

Make batch processor parameters configurable per environment.

Configuration Tiers:

Production (Minimal Overhead)

BatchSpanProcessor(
    max_queue_size=4096,              # Larger buffer
    max_export_batch_size=1000,       # Larger batches
    schedule_delay_millis=10000,      # Less frequent (10s)
    max_export_timeout_millis=30000,  # Allow more time
)

Expected overhead: <5µs per traced function

Staging (Balanced) - Current Default

BatchSpanProcessor(
    max_queue_size=2048,
    max_export_batch_size=512,
    schedule_delay_millis=5000,
)

Expected overhead: ~15µs per traced function

Development (Full Observability)

BatchSpanProcessor(
    max_queue_size=512,
    max_export_batch_size=100,
    schedule_delay_millis=1000,  # Export every 1s for faster feedback
)

Expected overhead: ~100µs per traced function


Priority 4: HTTP Compression

Add gzip compression for network transfer.

Implementation: Add compression to exporter

# In exporter.__init__
self._session.headers.update({
    "Content-Encoding": "gzip",
})

# In export method
import gzip
json_data = batch.model_dump(mode="json")
compressed = gzip.compress(json.dumps(json_data).encode())
response = self._session.post(self.endpoint, data=compressed, ...)

Expected Impact: ~70% reduction in network payload size

Note: Backend must support gzip decompression


Priority 5: Lazy Attribute Collection

Only collect verbose attributes when debugging.

# Instead of always collecting
if logger.isEnabledFor(logging.DEBUG):
    span.set_attribute("function.args_count", len(args))
    span.set_attribute("function.kwargs_count", len(kwargs))

Expected Impact: Minimal (~1-2µs savings)


Performance Benchmarks (Expected)

Configuration Overhead per Trace Throughput Use Case
Minimal <5µs 200K spans/sec Production high-volume
Balanced (current) ~15µs 65K spans/sec Staging
Full ~100µs 10K spans/sec Development

Note: These are theoretical estimates. Real benchmarks needed.


Configuration Variables Summary

# Proposed environment variables for tuning

# Sampling (Priority 1)
RHESIS_SAMPLING_RATE=1.0          # 1.0 = 100%, 0.1 = 10%

# Result preview (Priority 2)
RHESIS_TRACE_RESULT_PREVIEW=false # true = capture result, false = skip

# Validation (Priority 2)
RHESIS_STRICT_VALIDATION=true     # true = validate, false = skip in export

# Batch processor (Priority 3)
RHESIS_BATCH_QUEUE_SIZE=2048
RHESIS_BATCH_SIZE=512
RHESIS_BATCH_DELAY_MS=5000
RHESIS_BATCH_TIMEOUT_MS=30000

# Compression (Priority 4)
RHESIS_TRACE_COMPRESSION=false    # true = gzip, false = plain JSON

# Debug attributes (Priority 5)
RHESIS_VERBOSE_ATTRIBUTES=false   # true = collect all, false = minimal

Goals

  • Minimize telemetry overhead to <5µs per traced function in production
  • Support high-volume services (>1000 req/sec) without performance degradation
  • Provide configurable performance tiers (dev/staging/prod)
  • Maintain full observability in development while optimizing for production

Deliverables

  • Benchmarking infrastructure using pytest-benchmark
  • Sampling support implementation (Priority 1)
  • Optional result serialization (Priority 2)
  • Configurable BatchSpanProcessor parameters (Priority 3)
  • HTTP compression support (Priority 4)
  • Performance tuning guide documentation
  • Real-world profiling results from chatbot and production services

Steps / Action Plan

  1. Set up benchmarking infrastructure to measure current overhead
  2. Implement sampling support with RHESIS_SAMPLING_RATE env var
  3. Make result serialization optional via RHESIS_TRACE_RESULT_PREVIEW
  4. Add configuration presets for BatchSpanProcessor (dev/staging/prod)
  5. Implement gzip compression for trace export
  6. Profile real applications to validate improvements
  7. Document performance tuning best practices

Acceptance Criteria

  • Benchmarks show <5µs overhead per trace in production config
  • Sampling reduces overhead linearly (10% sampling = 90% reduction)
  • Configuration via environment variables documented
  • Zero breaking changes to existing API
  • All existing tests pass
  • Performance guide added to documentation

Dependencies

  • WP3 SDK Integration completed ✅
  • Access to production-like workload for testing

Risks / Considerations

  • Aggressive optimization may reduce observability if not configurable
  • Need to balance performance vs debugging capabilities
  • Must maintain backward compatibility with existing deployments

Additional Context

Current architecture already includes good optimizations (BatchSpanProcessor, async export, singleton provider). This task focuses on additional tuning for high-volume scenarios.

Related: WP3_SDK_INTEGRATION.md

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions