Investigate and Implement Telemetry Overhead Optimizations

## Summary
Investigate and implement optimizations to minimize OpenTelemetry telemetry overhead for client applications using the Rhesis SDK.

## Background / Context
Following WP3 SDK Integration implementation, we have a working OpenTelemetry integration with asynchronous batching. However, we've identified several potential optimization points that could reduce overhead for high-volume production applications.

---

## Current Architecture Performance Profile

### ✅ Already Optimized

The current implementation includes several excellent performance optimizations:

1. **Asynchronous Batching via BatchSpanProcessor**
   - Buffer: 2048 spans
   - Batch size: 512 spans per HTTP request
   - Export interval: Every 5 seconds
   - **Impact**: Application continues immediately; spans export in background
   - **Overhead per span**: ~1-2µs (just memory write)

2. **Non-Blocking Span Creation**
   - Context manager for span lifecycle
   - Data stored in memory, not sent immediately

3. **Singleton TracerProvider**
   - Single provider reused across all requests
   - No repeated initialization overhead

### ⚠️ Identified Overhead Points

#### 1. Synchronous Result Serialization

**Location**: `sdk/src/rhesis/sdk/telemetry/tracer.py:113`

```python
result_str = str(result)[:1000]  # Runs synchronously
span.set_attribute("function.result_preview", result_str)
```

**Problem**: `str(result)` can be expensive for large objects (Pydantic models, lists)

**Estimated Overhead**: 10-50µs for simple objects, 100µs+ for complex objects

**Proposed Solutions**:

**Option A: Make it optional via environment variable**
```python
if os.getenv("RHESIS_TRACE_RESULT_PREVIEW", "false") == "true":
    result_str = str(result)[:1000]
    span.set_attribute("function.result_preview", result_str)
```

**Option B: Use type name instead**
```python
span.set_attribute("function.result_type", type(result).__name__)
span.set_attribute("function.result_size", len(str(result)[:100]))  # Cheaper estimate
```

**Recommendation**: Option A for flexibility

---

#### 2. Pydantic Validation During Export

**Location**: `sdk/src/rhesis/sdk/telemetry/exporter.py:179-195`

```python
otel_span = OTELSpan(  # Pydantic validation on every span
    trace_id=trace_id,
    span_id=span_id,
    # ... validation happens here
)
```

**Problem**: Pydantic validation adds overhead during span export (background thread)

**Estimated Overhead**: ~100-500µs per span during export

**Proposed Solutions**:

**Option A: Skip validation in production**
```python
# Use model_construct to skip validation
otel_span = OTELSpan.model_construct(
    trace_id=trace_id,
    span_id=span_id,
    # ... no validation overhead
)
```

**Option B: Configurable validation**
```python
if os.getenv("RHESIS_STRICT_VALIDATION", "true") == "true":
    otel_span = OTELSpan(**data)  # Validate
else:
    otel_span = OTELSpan.model_construct(**data)  # Skip validation
```

**Recommendation**: Option B, default to validation in development

**Note**: Validation happens in background thread, so less critical than #1

---

#### 3. Dictionary Conversions

**Location**: `sdk/src/rhesis/sdk/telemetry/exporter.py:191-194`

```python
attributes=dict(span.attributes) if span.attributes else {},
# ... repeated for events, links, resource
```

**Problem**: Creating new dictionaries adds memory allocations

**Estimated Overhead**: ~5-10µs per span

**Proposed Solution**:
```python
# Avoid unnecessary conversion if already dict
attributes=span.attributes if isinstance(span.attributes, dict) else dict(span.attributes)
```

**Recommendation**: Low priority - investigate if profiling shows significant impact

---

## Recommended Optimizations (Priority Order)

### Priority 1: Sampling Support

Add ability to sample traces to reduce volume in high-traffic scenarios.

**Implementation**: Add sampling configuration to TracerProvider

```python
# In tracer.py
def trace_execution(self, function_name, func, args, kwargs, span_name=None):
    sampling_rate = float(os.getenv("RHESIS_SAMPLING_RATE", "1.0"))
    if random.random() > sampling_rate:
        return func(*args, **kwargs)  # Skip tracing
    
    # Continue with normal tracing...
```

**Use Case**: High-volume production services (>1000 req/sec)

**Expected Impact**: Linear reduction in overhead (10% sampling = 90% overhead reduction)

---

### Priority 2: Optional Result Serialization

Make result preview optional (see #1 above).

**Use Case**: Functions returning large objects

**Expected Impact**: 10-100µs savings per trace

**Configuration**:
```bash
# Development - full observability
RHESIS_TRACE_RESULT_PREVIEW=true

# Production - minimal overhead
RHESIS_TRACE_RESULT_PREVIEW=false  # Default
```

---

### Priority 3: Tunable BatchSpanProcessor

Make batch processor parameters configurable per environment.

**Configuration Tiers**:

#### Production (Minimal Overhead)
```python
BatchSpanProcessor(
    max_queue_size=4096,              # Larger buffer
    max_export_batch_size=1000,       # Larger batches
    schedule_delay_millis=10000,      # Less frequent (10s)
    max_export_timeout_millis=30000,  # Allow more time
)
```
**Expected overhead**: <5µs per traced function

#### Staging (Balanced) - Current Default
```python
BatchSpanProcessor(
    max_queue_size=2048,
    max_export_batch_size=512,
    schedule_delay_millis=5000,
)
```
**Expected overhead**: ~15µs per traced function

#### Development (Full Observability)
```python
BatchSpanProcessor(
    max_queue_size=512,
    max_export_batch_size=100,
    schedule_delay_millis=1000,  # Export every 1s for faster feedback
)
```
**Expected overhead**: ~100µs per traced function

---

### Priority 4: HTTP Compression

Add gzip compression for network transfer.

**Implementation**: Add compression to exporter

```python
# In exporter.__init__
self._session.headers.update({
    "Content-Encoding": "gzip",
})

# In export method
import gzip
json_data = batch.model_dump(mode="json")
compressed = gzip.compress(json.dumps(json_data).encode())
response = self._session.post(self.endpoint, data=compressed, ...)
```

**Expected Impact**: ~70% reduction in network payload size

**Note**: Backend must support gzip decompression

---

### Priority 5: Lazy Attribute Collection

Only collect verbose attributes when debugging.

```python
# Instead of always collecting
if logger.isEnabledFor(logging.DEBUG):
    span.set_attribute("function.args_count", len(args))
    span.set_attribute("function.kwargs_count", len(kwargs))
```

**Expected Impact**: Minimal (~1-2µs savings)

---

## Performance Benchmarks (Expected)

| Configuration | Overhead per Trace | Throughput | Use Case |
|--------------|-------------------|------------|----------|
| **Minimal** | <5µs | 200K spans/sec | Production high-volume |
| **Balanced** (current) | ~15µs | 65K spans/sec | Staging |
| **Full** | ~100µs | 10K spans/sec | Development |

**Note**: These are theoretical estimates. Real benchmarks needed.

---

## Configuration Variables Summary

```bash
# Proposed environment variables for tuning

# Sampling (Priority 1)
RHESIS_SAMPLING_RATE=1.0          # 1.0 = 100%, 0.1 = 10%

# Result preview (Priority 2)
RHESIS_TRACE_RESULT_PREVIEW=false # true = capture result, false = skip

# Validation (Priority 2)
RHESIS_STRICT_VALIDATION=true     # true = validate, false = skip in export

# Batch processor (Priority 3)
RHESIS_BATCH_QUEUE_SIZE=2048
RHESIS_BATCH_SIZE=512
RHESIS_BATCH_DELAY_MS=5000
RHESIS_BATCH_TIMEOUT_MS=30000

# Compression (Priority 4)
RHESIS_TRACE_COMPRESSION=false    # true = gzip, false = plain JSON

# Debug attributes (Priority 5)
RHESIS_VERBOSE_ATTRIBUTES=false   # true = collect all, false = minimal
```

---

## Goals
- Minimize telemetry overhead to <5µs per traced function in production
- Support high-volume services (>1000 req/sec) without performance degradation
- Provide configurable performance tiers (dev/staging/prod)
- Maintain full observability in development while optimizing for production

## Deliverables
- [ ] Benchmarking infrastructure using pytest-benchmark
- [ ] Sampling support implementation (Priority 1)
- [ ] Optional result serialization (Priority 2)
- [ ] Configurable BatchSpanProcessor parameters (Priority 3)
- [ ] HTTP compression support (Priority 4)
- [ ] Performance tuning guide documentation
- [ ] Real-world profiling results from chatbot and production services

## Steps / Action Plan
1. Set up benchmarking infrastructure to measure current overhead
2. Implement sampling support with `RHESIS_SAMPLING_RATE` env var
3. Make result serialization optional via `RHESIS_TRACE_RESULT_PREVIEW`
4. Add configuration presets for BatchSpanProcessor (dev/staging/prod)
5. Implement gzip compression for trace export
6. Profile real applications to validate improvements
7. Document performance tuning best practices

## Acceptance Criteria
- [ ] Benchmarks show <5µs overhead per trace in production config
- [ ] Sampling reduces overhead linearly (10% sampling = 90% reduction)
- [ ] Configuration via environment variables documented
- [ ] Zero breaking changes to existing API
- [ ] All existing tests pass
- [ ] Performance guide added to documentation

## Dependencies
- WP3 SDK Integration completed ✅
- Access to production-like workload for testing

## Risks / Considerations
- Aggressive optimization may reduce observability if not configurable
- Need to balance performance vs debugging capabilities
- Must maintain backward compatibility with existing deployments

## Additional Context
Current architecture already includes good optimizations (BatchSpanProcessor, async export, singleton provider). This task focuses on additional tuning for high-volume scenarios.

Related: WP3_SDK_INTEGRATION.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate and Implement Telemetry Overhead Optimizations #1081

Summary

Background / Context

Current Architecture Performance Profile

✅ Already Optimized

⚠️ Identified Overhead Points

1. Synchronous Result Serialization

2. Pydantic Validation During Export

3. Dictionary Conversions

Recommended Optimizations (Priority Order)

Priority 1: Sampling Support

Priority 2: Optional Result Serialization

Priority 3: Tunable BatchSpanProcessor

Production (Minimal Overhead)

Staging (Balanced) - Current Default

Development (Full Observability)

Priority 4: HTTP Compression

Priority 5: Lazy Attribute Collection

Performance Benchmarks (Expected)

Configuration Variables Summary

Goals

Deliverables

Steps / Action Plan

Acceptance Criteria

Dependencies

Risks / Considerations

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configuration	Overhead per Trace	Throughput	Use Case
Minimal	<5µs	200K spans/sec	Production high-volume
Balanced (current)	~15µs	65K spans/sec	Staging
Full	~100µs	10K spans/sec	Development

Investigate and Implement Telemetry Overhead Optimizations #1081

Description

Summary

Background / Context

Current Architecture Performance Profile

✅ Already Optimized

⚠️ Identified Overhead Points

1. Synchronous Result Serialization

2. Pydantic Validation During Export

3. Dictionary Conversions

Recommended Optimizations (Priority Order)

Priority 1: Sampling Support

Priority 2: Optional Result Serialization

Priority 3: Tunable BatchSpanProcessor

Production (Minimal Overhead)

Staging (Balanced) - Current Default

Development (Full Observability)

Priority 4: HTTP Compression

Priority 5: Lazy Attribute Collection

Performance Benchmarks (Expected)

Configuration Variables Summary

Goals

Deliverables

Steps / Action Plan

Acceptance Criteria

Dependencies

Risks / Considerations

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions