Summary
Investigate and implement optimizations to minimize OpenTelemetry telemetry overhead for client applications using the Rhesis SDK.
Background / Context
Following WP3 SDK Integration implementation, we have a working OpenTelemetry integration with asynchronous batching. However, we've identified several potential optimization points that could reduce overhead for high-volume production applications.
Current Architecture Performance Profile
✅ Already Optimized
The current implementation includes several excellent performance optimizations:
-
Asynchronous Batching via BatchSpanProcessor
- Buffer: 2048 spans
- Batch size: 512 spans per HTTP request
- Export interval: Every 5 seconds
- Impact: Application continues immediately; spans export in background
- Overhead per span: ~1-2µs (just memory write)
-
Non-Blocking Span Creation
- Context manager for span lifecycle
- Data stored in memory, not sent immediately
-
Singleton TracerProvider
- Single provider reused across all requests
- No repeated initialization overhead
⚠️ Identified Overhead Points
1. Synchronous Result Serialization
Location: sdk/src/rhesis/sdk/telemetry/tracer.py:113
result_str = str(result)[:1000] # Runs synchronously
span.set_attribute("function.result_preview", result_str)
Problem: str(result) can be expensive for large objects (Pydantic models, lists)
Estimated Overhead: 10-50µs for simple objects, 100µs+ for complex objects
Proposed Solutions:
Option A: Make it optional via environment variable
if os.getenv("RHESIS_TRACE_RESULT_PREVIEW", "false") == "true":
result_str = str(result)[:1000]
span.set_attribute("function.result_preview", result_str)
Option B: Use type name instead
span.set_attribute("function.result_type", type(result).__name__)
span.set_attribute("function.result_size", len(str(result)[:100])) # Cheaper estimate
Recommendation: Option A for flexibility
2. Pydantic Validation During Export
Location: sdk/src/rhesis/sdk/telemetry/exporter.py:179-195
otel_span = OTELSpan( # Pydantic validation on every span
trace_id=trace_id,
span_id=span_id,
# ... validation happens here
)
Problem: Pydantic validation adds overhead during span export (background thread)
Estimated Overhead: ~100-500µs per span during export
Proposed Solutions:
Option A: Skip validation in production
# Use model_construct to skip validation
otel_span = OTELSpan.model_construct(
trace_id=trace_id,
span_id=span_id,
# ... no validation overhead
)
Option B: Configurable validation
if os.getenv("RHESIS_STRICT_VALIDATION", "true") == "true":
otel_span = OTELSpan(**data) # Validate
else:
otel_span = OTELSpan.model_construct(**data) # Skip validation
Recommendation: Option B, default to validation in development
Note: Validation happens in background thread, so less critical than #1
3. Dictionary Conversions
Location: sdk/src/rhesis/sdk/telemetry/exporter.py:191-194
attributes=dict(span.attributes) if span.attributes else {},
# ... repeated for events, links, resource
Problem: Creating new dictionaries adds memory allocations
Estimated Overhead: ~5-10µs per span
Proposed Solution:
# Avoid unnecessary conversion if already dict
attributes=span.attributes if isinstance(span.attributes, dict) else dict(span.attributes)
Recommendation: Low priority - investigate if profiling shows significant impact
Recommended Optimizations (Priority Order)
Priority 1: Sampling Support
Add ability to sample traces to reduce volume in high-traffic scenarios.
Implementation: Add sampling configuration to TracerProvider
# In tracer.py
def trace_execution(self, function_name, func, args, kwargs, span_name=None):
sampling_rate = float(os.getenv("RHESIS_SAMPLING_RATE", "1.0"))
if random.random() > sampling_rate:
return func(*args, **kwargs) # Skip tracing
# Continue with normal tracing...
Use Case: High-volume production services (>1000 req/sec)
Expected Impact: Linear reduction in overhead (10% sampling = 90% overhead reduction)
Priority 2: Optional Result Serialization
Make result preview optional (see #1 above).
Use Case: Functions returning large objects
Expected Impact: 10-100µs savings per trace
Configuration:
# Development - full observability
RHESIS_TRACE_RESULT_PREVIEW=true
# Production - minimal overhead
RHESIS_TRACE_RESULT_PREVIEW=false # Default
Priority 3: Tunable BatchSpanProcessor
Make batch processor parameters configurable per environment.
Configuration Tiers:
Production (Minimal Overhead)
BatchSpanProcessor(
max_queue_size=4096, # Larger buffer
max_export_batch_size=1000, # Larger batches
schedule_delay_millis=10000, # Less frequent (10s)
max_export_timeout_millis=30000, # Allow more time
)
Expected overhead: <5µs per traced function
Staging (Balanced) - Current Default
BatchSpanProcessor(
max_queue_size=2048,
max_export_batch_size=512,
schedule_delay_millis=5000,
)
Expected overhead: ~15µs per traced function
Development (Full Observability)
BatchSpanProcessor(
max_queue_size=512,
max_export_batch_size=100,
schedule_delay_millis=1000, # Export every 1s for faster feedback
)
Expected overhead: ~100µs per traced function
Priority 4: HTTP Compression
Add gzip compression for network transfer.
Implementation: Add compression to exporter
# In exporter.__init__
self._session.headers.update({
"Content-Encoding": "gzip",
})
# In export method
import gzip
json_data = batch.model_dump(mode="json")
compressed = gzip.compress(json.dumps(json_data).encode())
response = self._session.post(self.endpoint, data=compressed, ...)
Expected Impact: ~70% reduction in network payload size
Note: Backend must support gzip decompression
Priority 5: Lazy Attribute Collection
Only collect verbose attributes when debugging.
# Instead of always collecting
if logger.isEnabledFor(logging.DEBUG):
span.set_attribute("function.args_count", len(args))
span.set_attribute("function.kwargs_count", len(kwargs))
Expected Impact: Minimal (~1-2µs savings)
Performance Benchmarks (Expected)
| Configuration |
Overhead per Trace |
Throughput |
Use Case |
| Minimal |
<5µs |
200K spans/sec |
Production high-volume |
| Balanced (current) |
~15µs |
65K spans/sec |
Staging |
| Full |
~100µs |
10K spans/sec |
Development |
Note: These are theoretical estimates. Real benchmarks needed.
Configuration Variables Summary
# Proposed environment variables for tuning
# Sampling (Priority 1)
RHESIS_SAMPLING_RATE=1.0 # 1.0 = 100%, 0.1 = 10%
# Result preview (Priority 2)
RHESIS_TRACE_RESULT_PREVIEW=false # true = capture result, false = skip
# Validation (Priority 2)
RHESIS_STRICT_VALIDATION=true # true = validate, false = skip in export
# Batch processor (Priority 3)
RHESIS_BATCH_QUEUE_SIZE=2048
RHESIS_BATCH_SIZE=512
RHESIS_BATCH_DELAY_MS=5000
RHESIS_BATCH_TIMEOUT_MS=30000
# Compression (Priority 4)
RHESIS_TRACE_COMPRESSION=false # true = gzip, false = plain JSON
# Debug attributes (Priority 5)
RHESIS_VERBOSE_ATTRIBUTES=false # true = collect all, false = minimal
Goals
- Minimize telemetry overhead to <5µs per traced function in production
- Support high-volume services (>1000 req/sec) without performance degradation
- Provide configurable performance tiers (dev/staging/prod)
- Maintain full observability in development while optimizing for production
Deliverables
Steps / Action Plan
- Set up benchmarking infrastructure to measure current overhead
- Implement sampling support with
RHESIS_SAMPLING_RATE env var
- Make result serialization optional via
RHESIS_TRACE_RESULT_PREVIEW
- Add configuration presets for BatchSpanProcessor (dev/staging/prod)
- Implement gzip compression for trace export
- Profile real applications to validate improvements
- Document performance tuning best practices
Acceptance Criteria
Dependencies
- WP3 SDK Integration completed ✅
- Access to production-like workload for testing
Risks / Considerations
- Aggressive optimization may reduce observability if not configurable
- Need to balance performance vs debugging capabilities
- Must maintain backward compatibility with existing deployments
Additional Context
Current architecture already includes good optimizations (BatchSpanProcessor, async export, singleton provider). This task focuses on additional tuning for high-volume scenarios.
Related: WP3_SDK_INTEGRATION.md
Summary
Investigate and implement optimizations to minimize OpenTelemetry telemetry overhead for client applications using the Rhesis SDK.
Background / Context
Following WP3 SDK Integration implementation, we have a working OpenTelemetry integration with asynchronous batching. However, we've identified several potential optimization points that could reduce overhead for high-volume production applications.
Current Architecture Performance Profile
✅ Already Optimized
The current implementation includes several excellent performance optimizations:
Asynchronous Batching via BatchSpanProcessor
Non-Blocking Span Creation
Singleton TracerProvider
1. Synchronous Result Serialization
Location:
sdk/src/rhesis/sdk/telemetry/tracer.py:113Problem:
str(result)can be expensive for large objects (Pydantic models, lists)Estimated Overhead: 10-50µs for simple objects, 100µs+ for complex objects
Proposed Solutions:
Option A: Make it optional via environment variable
Option B: Use type name instead
Recommendation: Option A for flexibility
2. Pydantic Validation During Export
Location:
sdk/src/rhesis/sdk/telemetry/exporter.py:179-195Problem: Pydantic validation adds overhead during span export (background thread)
Estimated Overhead: ~100-500µs per span during export
Proposed Solutions:
Option A: Skip validation in production
Option B: Configurable validation
Recommendation: Option B, default to validation in development
Note: Validation happens in background thread, so less critical than #1
3. Dictionary Conversions
Location:
sdk/src/rhesis/sdk/telemetry/exporter.py:191-194Problem: Creating new dictionaries adds memory allocations
Estimated Overhead: ~5-10µs per span
Proposed Solution:
Recommendation: Low priority - investigate if profiling shows significant impact
Recommended Optimizations (Priority Order)
Priority 1: Sampling Support
Add ability to sample traces to reduce volume in high-traffic scenarios.
Implementation: Add sampling configuration to TracerProvider
Use Case: High-volume production services (>1000 req/sec)
Expected Impact: Linear reduction in overhead (10% sampling = 90% overhead reduction)
Priority 2: Optional Result Serialization
Make result preview optional (see #1 above).
Use Case: Functions returning large objects
Expected Impact: 10-100µs savings per trace
Configuration:
Priority 3: Tunable BatchSpanProcessor
Make batch processor parameters configurable per environment.
Configuration Tiers:
Production (Minimal Overhead)
Expected overhead: <5µs per traced function
Staging (Balanced) - Current Default
Expected overhead: ~15µs per traced function
Development (Full Observability)
Expected overhead: ~100µs per traced function
Priority 4: HTTP Compression
Add gzip compression for network transfer.
Implementation: Add compression to exporter
Expected Impact: ~70% reduction in network payload size
Note: Backend must support gzip decompression
Priority 5: Lazy Attribute Collection
Only collect verbose attributes when debugging.
Expected Impact: Minimal (~1-2µs savings)
Performance Benchmarks (Expected)
Note: These are theoretical estimates. Real benchmarks needed.
Configuration Variables Summary
Goals
Deliverables
Steps / Action Plan
RHESIS_SAMPLING_RATEenv varRHESIS_TRACE_RESULT_PREVIEWAcceptance Criteria
Dependencies
Risks / Considerations
Additional Context
Current architecture already includes good optimizations (BatchSpanProcessor, async export, singleton provider). This task focuses on additional tuning for high-volume scenarios.
Related: WP3_SDK_INTEGRATION.md