Performance Analysis & Optimization Opportunities

Date: 2026-01-15 Game: template_lines (representative baseline) Simulations: 1,000 Total Time: 4.899s Speed: 204.1 sims/second Per Sim: 4.899ms

Executive Summary

Based on profiling data, the top 5 hot paths consuming ~70% of execution time are:

symbol.py:122(assign_paying_bool) - 0.895s (18.3%)
copy.py:119(deepcopy) - 0.968s cumulative (19.8%)
posix.read (multiprocessing) - 0.896s (18.3%)
set.add operations - 0.238s (4.9%)
JSON encoding - 0.225s (4.6%)

Key Finding: Current performance of 204 sims/second is reasonable, but there are several optimization opportunities that could yield 20-30% improvements.

Hot Path Analysis

1. Symbol.assign_paying_bool() - CRITICAL

Impact: 0.895s (18.3% of total time) Calls: 99,811 times Per Call: 8.96µs

Current Implementation (estimated from profiling):

def assign_paying_bool(self):
    # Likely checking paytable for each symbol
    # Called once per symbol during board creation
    pass

Problem: Called for EVERY symbol on EVERY board (99K+ times for 1K sims)

Optimization Opportunities:

Cache paytable lookups - Paytable doesn't change during simulation
Pre-compute paying symbols - Create a set of paying symbol names at initialization
Lazy evaluation - Only compute when needed (not always needed)

Estimated Impact: 30-50% reduction (0.3-0.4s savings)

Priority: 🔴 HIGH

2. Deep Copy Operations - CRITICAL

Impact: 0.968s cumulative (19.8% of total time) Calls: 639,109 deepcopy calls

Current Usage:

Board state copying
Symbol copying
Configuration copying

Problem: Excessive deepcopy usage (639K+ calls for 1K sims)

Optimization Opportunities:

Reduce deepcopy usage - Use shallow copy where safe
Object pooling - Reuse symbol objects instead of copying
Immutable data structures - Eliminate need for copying
Manual copying - Write custom copy methods for specific cases

Estimated Impact: 40-60% reduction (0.4-0.6s savings)

Priority: 🔴 HIGH

3. Multiprocessing Overhead - CANNOT OPTIMIZE

Impact: 0.896s (18.3% of total time) Calls: 520 posix.read calls

Analysis: This is overhead from multiprocessing communication (Manager, Pipes). It's necessary for parallel execution and cannot be optimized without changing architecture.

Trade-off: Multiprocessing provides speedup for large simulations (threads > 1), but adds overhead for small batches.

Recommendation: Accept this overhead. For production use with 10K+ sims and multiple threads, the benefit outweighs the cost.

Priority: ⚪ ACCEPT

4. Set Operations - MEDIUM PRIORITY

Impact: 0.238s (4.9% of total time) Calls: 2,995,341 set.add operations

Current Usage:

Tracking winning positions
Tracking special symbols
Building line/cluster sets

Optimization Opportunities:

Use lists where order matters - Lists are faster for small collections
Pre-allocate sets - Reduce resizing overhead
Batch operations - Fewer, larger set operations

Estimated Impact: 10-20% reduction (0.02-0.05s savings)

Priority: 🟡 MEDIUM

5. JSON Encoding - LOW PRIORITY

Impact: 0.225s (4.6% of total time) Calls: 5,522 encoder.iterencode calls

Analysis: JSON encoding is necessary for output. Already optimized in Phase 3.1 (OutputFormatter).

Additional Opportunities:

Use orjson library - 2-3x faster than standard json
Batch writes - Write larger chunks less frequently
Binary format - Use msgpack or pickle for internal use

Estimated Impact: 30-50% reduction (0.07-0.11s savings) with orjson

Priority: 🟢 LOW (already optimized in Phase 3.1)

Optimization Recommendations

Phase 5.1A: Quick Wins (1-2 hours)

1. Cache Symbol Paytable Lookups ✅

File: src/calculations/symbol.py:122

Change:

class Symbol:
    _paying_symbols_cache = {}  # Class-level cache

    def assign_paying_bool(self):
        # Use cached lookup instead of recalculating
        cache_key = (self.name, self.config_hash)
        if cache_key in Symbol._paying_symbols_cache:
            self.paying = Symbol._paying_symbols_cache[cache_key]
        else:
            # Original logic
            self.paying = self._check_paytable()
            Symbol._paying_symbols_cache[cache_key] = self.paying

Impact: -0.3s to -0.4s (6-8% total speedup)

2. Reduce Unnecessary Deepcopy ✅

Files: board.py, base_game_state.py

Change: Identify where deepcopy is used unnecessarily and replace with:

Shallow copy for simple dicts/lists
Reference passing for read-only data
Custom copy methods for specific classes

Impact: -0.4s to -0.6s (8-12% total speedup)

Combined Quick Wins: 14-20% faster (4.9s → 4.0-4.2s)

Phase 5.1B: Medium Effort (2-3 hours)

3. Object Pooling for Symbols 🔄

File: src/calculations/symbol.py

Change: Reuse Symbol objects instead of creating new ones

class SymbolPool:
    def __init__(self, config):
        # Pre-create symbol objects for each type
        self.pool = {name: [] for name in config.symbols}

    def get_symbol(self, name):
        if self.pool[name]:
            return self.pool[name].pop()
        return Symbol(name)

    def return_symbol(self, symbol):
        symbol.reset()
        self.pool[symbol.name].append(symbol)

Impact: -0.2s to -0.3s (4-6% total speedup)

4. Optimize Set Operations 🔄

Files: lines.py, cluster.py, ways.py

Change: Use lists for small collections, pre-allocate sets

Impact: -0.05s to -0.1s (1-2% total speedup)

Combined Medium Effort: 5-8% faster (additional speedup)

Phase 5.1C: Major Refactor (4-6 hours) - NOT RECOMMENDED

5. Replace Multiprocessing with asyncio

Benefit: Eliminate 0.9s overhead Cost: Major architecture change, potential correctness issues Recommendation: SKIP - Multiprocessing is appropriate for CPU-bound work

6. Binary Output Format

Benefit: Faster serialization Cost: Breaking change, RGS compatibility issues Recommendation: DEFER - Phase 3 compression is sufficient

Target Performance Goals

Baseline (Current)

Speed: 204.1 sims/second
Time for 1K sims: 4.9s
Time for 10K sims: 49s
Time for 100K sims: 490s (8.2 minutes)

After Phase 5.1A Optimizations (Quick Wins)

Speed: 240-250 sims/second (+18-23%)
Time for 1K sims: 4.0-4.2s
Time for 10K sims: 40-42s
Time for 100K sims: 400-420s (6.7-7.0 minutes)

After Phase 5.1A+B Optimizations (All Optimizations)

Speed: 255-270 sims/second (+25-32%)
Time for 1K sims: 3.7-3.9s
Time for 10K sims: 37-39s
Time for 100K sims: 370-390s (6.2-6.5 minutes)

Memory Analysis

To Be Completed: Run memory profiling with tracemalloc

Expected Findings:

High memory usage from symbol objects (99K+ instances)
Board storage overhead
Event list accumulation

Optimization Opportunities:

Symbol object pooling (reuse objects)
Streaming output (write events incrementally)
Reduce intermediate data structures

Conclusions

What to Optimize Now (Phase 5.1)

✅ Symbol paytable caching - 6-8% speedup, low risk
✅ Reduce deepcopy usage - 8-12% speedup, medium risk
🔄 Symbol object pooling - 4-6% speedup, medium risk (optional)

What NOT to Optimize

❌ Multiprocessing overhead - Necessary architecture
❌ JSON encoding - Already optimized in Phase 3.1
❌ Line/cluster algorithms - Already efficient

Expected Results

Total speedup: 18-23% (Phase 5.1A) to 25-32% (Phase 5.1A+B)
10K simulations: 49s → 37-42s
100K simulations: 8.2min → 6.2-7.0min
Risk level: Low to medium
Breaking changes: None

Next Steps

✅ Complete profiling analysis (this document)
⏭️ Implement Symbol paytable caching
⏭️ Audit and reduce deepcopy usage
⏭️ Run benchmarks to measure improvements
⏭️ Update RALPH_TASKS.md with results
⏭️ Document optimizations in code comments

Status: Analysis complete, ready for implementation Estimated Implementation Time: 3-5 hours Expected Benefit: 18-32% faster simulations Risk Assessment: Low (non-breaking changes)

Uh oh!

Performance Analysis & Optimization Opportunities

Performance Analysis & Optimization Opportunities

Executive Summary

Hot Path Analysis

1. Symbol.assign_paying_bool() - CRITICAL

2. Deep Copy Operations - CRITICAL

3. Multiprocessing Overhead - CANNOT OPTIMIZE

4. Set Operations - MEDIUM PRIORITY

5. JSON Encoding - LOW PRIORITY

Optimization Recommendations

Phase 5.1A: Quick Wins (1-2 hours)

1. Cache Symbol Paytable Lookups ✅

2. Reduce Unnecessary Deepcopy ✅

Phase 5.1B: Medium Effort (2-3 hours)

3. Object Pooling for Symbols 🔄

4. Optimize Set Operations 🔄

Phase 5.1C: Major Refactor (4-6 hours) - NOT RECOMMENDED

5. Replace Multiprocessing with asyncio

6. Binary Output Format

Target Performance Goals

Baseline (Current)

After Phase 5.1A Optimizations (Quick Wins)

After Phase 5.1A+B Optimizations (All Optimizations)

Memory Analysis

Conclusions

What to Optimize Now (Phase 5.1)

What NOT to Optimize

Expected Results

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally