Skip to content

Commit 84a1397

Browse files
authored
Merge pull request #365 from algorithmicsuperintelligence/feat-update-circle-example
Feat update circle example
2 parents ba5647d + 81fdc9d commit 84a1397

File tree

3 files changed

+201
-65
lines changed

3 files changed

+201
-65
lines changed

examples/k_module_problem/README.md

Lines changed: 30 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -52,51 +52,6 @@ Generation 2 (crossover):
5252

5353
**Key insight**: Evolution discovers correct modules in different individuals and **crossover combines them**. This is the "Building Block Hypothesis" - complex solutions are assembled from simpler discovered components.
5454

55-
## Theoretical Analysis
56-
57-
| Method | Expected Evaluations | Why |
58-
|--------|---------------------|-----|
59-
| **Random Search** | ~312 (50% of space) | Pure luck |
60-
| **Pass@100 (LLM)** | ~100 calls, ~15% success | Independent samples, no learning |
61-
| **Iterative Refinement** | ~312+ | No gradient, random walk |
62-
| **Evolution (pop=20)** | ~40-60 | Parallel exploration + crossover |
63-
64-
The gap widens exponentially with more modules:
65-
- K=5 modules: Iterative ~1,562, Evolution ~70
66-
- K=6 modules: Iterative ~7,812, Evolution ~90
67-
68-
### Note on Pass@k with Closed Models
69-
70-
The pass@k metric (probability of finding solution in k independent attempts) is commonly used to evaluate LLM capabilities. However:
71-
72-
- **Open models** (local): Can generate k responses in parallel with `n=k` parameter
73-
- **Closed models** (API): Most don't support `n>1`, requiring k separate API calls
74-
75-
For this comparison, we include a **random baseline** that simulates pass@k without an LLM. This establishes the "no learning" baseline.
76-
77-
### Random Baseline Results (100 trials, 100 samples each)
78-
79-
| Metric | Value |
80-
|--------|-------|
81-
| **Success rate (pass@100)** | 16% (16/100 trials found solution) |
82-
| **Avg samples to solution** | 43.3 (when found) |
83-
| **Min samples** | 5 (lucky guess) |
84-
| **Max samples** | 91 |
85-
86-
**Pass@k breakdown:**
87-
88-
| k | Empirical | Theoretical |
89-
|---|-----------|-------------|
90-
| 1 | 0% | 0.2% |
91-
| 10 | 1% | 1.6% |
92-
| 20 | 4% | 3.2% |
93-
| 50 | 9% | 7.7% |
94-
| 100 | 16% | 14.8% |
95-
96-
The empirical results closely match the theoretical prediction `pass@k ≈ 1 - (624/625)^k`.
97-
98-
Any method that beats this baseline is demonstrating actual optimization, not just random sampling.
99-
10055
## Running the Experiment
10156

10257
### Prerequisites
@@ -159,6 +114,17 @@ This generates:
159114

160115
## Experimental Results
161116

117+
### Random Baseline (100 trials, 100 samples each)
118+
119+
| Metric | Value |
120+
|--------|-------|
121+
| **Success rate (pass@100)** | 16% (16/100 trials found solution) |
122+
| **Avg samples to solution** | 43.3 (when found) |
123+
| **Min samples** | 5 (lucky guess) |
124+
| **Max samples** | 91 |
125+
126+
This establishes the "no learning" baseline. Any method that beats this is demonstrating actual optimization, not just random sampling.
127+
162128
### Iterative Refinement Results (3 trials, 100 iterations max)
163129

164130
| Trial | Iterations | Result | Best Score |
@@ -174,31 +140,31 @@ This generates:
174140

175141
**Key observation**: The iterative agent repeatedly finds configurations with 3/4 correct modules (`csv_reader`, `quicksort`, `json`) but cannot identify that `preprocess` is the wrong module. It keeps cycling through variations without escaping this local optimum.
176142

177-
### OpenEvolve (Evolutionary) Results
143+
### OpenEvolve (Evolutionary) Results (3 trials, 100 iterations max)
178144

179-
| Trial | Iterations | Result | Best Score | Notes |
180-
|-------|------------|--------|------------|-------|
181-
| 1 | 21 | SUCCESS | 100% (4/4) | Solution found through population diversity |
145+
| Trial | Iterations | Result | Best Score |
146+
|-------|------------|--------|------------|
147+
| 1 | 18 | SUCCESS | 100% (4/4) |
148+
| 2 | 50 | SUCCESS | 100% (4/4) |
149+
| 3 | 89 | SUCCESS | 100% (4/4) |
182150

183151
**Summary:**
184-
- **Success rate**: 100% (1/1 trial found solution)
185-
- **Solution found at**: Iteration 21
186-
- **Key observation**: OpenEvolve's population-based approach explores multiple configurations in parallel. By iteration 9, the population already had diverse configurations, and by iteration 21, the correct combination was discovered.
152+
- **Success rate**: 100% (3/3 trials found solution)
153+
- **Avg iterations to solution**: 52.3
154+
- **Min iterations**: 18
155+
- **Max iterations**: 89
187156

188-
**Progression:**
189-
- Iteration 3: 25% (1/4) - Initial exploration
190-
- Iteration 9: 50% (2/4) - Multiple 50% configs in population
191-
- Iteration 21: 100% (4/4) - csv_reader, normalize, quicksort, json - PERFECT!
192-
193-
**Key advantage**: OpenEvolve's prompt encourages systematic exploration ("try DIFFERENT options for EACH module") rather than following potentially misleading hints. Combined with higher temperature (0.9), larger population (25), and more frequent migration, this leads to faster discovery.
157+
**Key advantage**: OpenEvolve's population-based approach maintains diverse configurations that explore different module combinations in parallel. Even when some individuals get stuck at local optima (75% with wrong preprocessing), others explore alternatives and eventually discover the correct solution.
194158

195159
### Comparison Summary
196160

197-
| Method | Success Rate | Evaluations to Solution | Key Limitation |
198-
|--------|-------------|------------------------|----------------|
199-
| **Random Baseline** | 16% | 43.3 avg (when found) | No learning |
200-
| **Iterative Refinement** | 33% | 13 (when found) | Gets stuck at 75%, can't escape local optima |
201-
| **OpenEvolve** | 100% | 21 | Population diversity + systematic exploration |
161+
| Method | Success Rate | Avg Iterations | Key Finding |
162+
|--------|-------------|----------------|-------------|
163+
| **Random Baseline** | 16% | 43.3 (when found) | No learning baseline |
164+
| **Iterative Refinement** | 33% (1/3) | 13 (when found) | Gets stuck at 75% local optimum |
165+
| **OpenEvolve** | **100% (3/3)** | 52.3 | Always finds solution |
166+
167+
**Key insight**: While OpenEvolve takes more iterations on average (52.3 vs 13), it has a **100% success rate** compared to iterative refinement's 33%. The evolutionary approach's population diversity ensures it eventually escapes local optima that trap single-trajectory methods.
202168

203169
## Why This Matters
204170

@@ -224,6 +190,7 @@ Real-world examples:
224190
| `config.yaml` | OpenEvolve configuration |
225191
| `iterative_agent.py` | Iterative refinement agent using OpenRouter API |
226192
| `run_iterative_trials.py` | Run multiple trials of iterative agent |
193+
| `run_openevolve_trials.py` | Run multiple trials of OpenEvolve |
227194
| `run_random_baseline.py` | Random search baseline with pass@k analysis |
228195
| `compare_results.py` | Analysis and visualization |
229196

examples/k_module_problem/config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ evaluator:
8181
use_llm_feedback: false
8282
enable_artifacts: true
8383

84-
# Early stopping - stop when we find the solution
85-
early_stopping_patience: 30 # Reduced - expect faster convergence
84+
# Early stopping - disabled to allow full exploration
85+
early_stopping_patience: 100 # Allow full run
8686
convergence_threshold: 0.001
8787
early_stopping_metric: "combined_score"
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
#!/usr/bin/env python3
2+
"""Run multiple trials of OpenEvolve to get statistics."""
3+
4+
import json
5+
import os
6+
import shutil
7+
import subprocess
8+
import sys
9+
from pathlib import Path
10+
11+
# Run from the example directory
12+
os.chdir(Path(__file__).parent)
13+
14+
15+
def run_trial(trial_num: int, max_iterations: int = 100, seed: int = None):
16+
"""Run a single OpenEvolve trial."""
17+
output_dir = f"openevolve_output_trial_{trial_num}"
18+
19+
# Clean output directory
20+
if os.path.exists(output_dir):
21+
shutil.rmtree(output_dir)
22+
23+
# Update config with new seed if provided
24+
if seed is not None:
25+
# Read config
26+
with open("config.yaml", "r") as f:
27+
config_content = f.read()
28+
29+
# Replace seed
30+
import re
31+
config_content = re.sub(r'random_seed:\s*\d+', f'random_seed: {seed}', config_content)
32+
33+
# Write temp config
34+
temp_config = f"config_trial_{trial_num}.yaml"
35+
with open(temp_config, "w") as f:
36+
f.write(config_content)
37+
else:
38+
temp_config = "config.yaml"
39+
40+
# Run OpenEvolve
41+
cmd = [
42+
"openevolve-run",
43+
"initial_program.py",
44+
"evaluator.py",
45+
"--config", temp_config,
46+
"--iterations", str(max_iterations),
47+
"--output", output_dir,
48+
]
49+
50+
print(f"\n{'='*60}")
51+
print(f"TRIAL {trial_num + 1}: Running OpenEvolve with seed {seed}")
52+
print('='*60)
53+
54+
result = subprocess.run(cmd, capture_output=True, text=True)
55+
56+
# Clean up temp config
57+
if seed is not None and os.path.exists(temp_config):
58+
os.remove(temp_config)
59+
60+
# Parse results from log
61+
solution_found_at = None
62+
best_score = 0.0
63+
64+
log_dir = Path(output_dir) / "logs"
65+
if log_dir.exists():
66+
log_files = list(log_dir.glob("*.log"))
67+
if log_files:
68+
with open(log_files[0], "r") as f:
69+
log_content = f.read()
70+
71+
import re
72+
73+
# Find best score
74+
score_matches = re.findall(r'combined_score[=:]\s*([\d.]+)', log_content)
75+
if score_matches:
76+
best_score = max(float(s) for s in score_matches)
77+
78+
# Look for first 100% solution - find the "New best" line with 1.0000
79+
new_best_matches = re.findall(r'New best solution found at iteration (\d+):', log_content)
80+
perfect_matches = re.findall(r'Iteration (\d+):.*?combined_score=1\.0000', log_content)
81+
82+
if perfect_matches:
83+
solution_found_at = int(perfect_matches[0])
84+
elif best_score >= 1.0 and new_best_matches:
85+
# Fallback: find last new best if we have 100%
86+
solution_found_at = int(new_best_matches[-1])
87+
88+
return {
89+
"trial": trial_num,
90+
"seed": seed,
91+
"solution_found_at": solution_found_at,
92+
"best_score": best_score,
93+
"max_iterations": max_iterations,
94+
}
95+
96+
97+
def run_trials(num_trials: int = 3, max_iterations: int = 100, base_seed: int = 100):
98+
"""Run multiple trials and collect statistics."""
99+
results = []
100+
solutions_found = []
101+
102+
for trial in range(num_trials):
103+
seed = base_seed + trial * 111 # Different seeds for each trial
104+
result = run_trial(trial, max_iterations, seed)
105+
results.append(result)
106+
107+
if result["solution_found_at"] is not None:
108+
solutions_found.append(result["solution_found_at"])
109+
print(f"Trial {trial + 1}: SUCCESS at iteration {result['solution_found_at']}")
110+
else:
111+
print(f"Trial {trial + 1}: FAILED (best score: {result['best_score']:.2%})")
112+
113+
# Calculate statistics
114+
success_rate = len(solutions_found) / num_trials
115+
avg_iterations = sum(solutions_found) / len(solutions_found) if solutions_found else float('inf')
116+
min_iterations = min(solutions_found) if solutions_found else None
117+
max_iterations_found = max(solutions_found) if solutions_found else None
118+
119+
print(f"\n{'='*60}")
120+
print("OPENEVOLVE TRIAL RESULTS")
121+
print('='*60)
122+
print(f"Trials: {num_trials}")
123+
print(f"Max iterations per trial: {max_iterations}")
124+
print(f"Success rate: {success_rate:.0%} ({len(solutions_found)}/{num_trials})")
125+
if solutions_found:
126+
print(f"Avg iterations to solution: {avg_iterations:.1f}")
127+
print(f"Min iterations: {min_iterations}")
128+
print(f"Max iterations: {max_iterations_found}")
129+
print('='*60)
130+
131+
# Save summary
132+
summary = {
133+
"config": {
134+
"num_trials": num_trials,
135+
"max_iterations": max_iterations,
136+
},
137+
"summary": {
138+
"success_rate": success_rate,
139+
"avg_iterations_to_solution": avg_iterations if solutions_found else None,
140+
"min_iterations": min_iterations,
141+
"max_iterations": max_iterations_found,
142+
"solutions_found": len(solutions_found),
143+
},
144+
"trials": results,
145+
}
146+
147+
with open("openevolve_trials_results.json", "w") as f:
148+
json.dump(summary, f, indent=2)
149+
150+
print(f"\nResults saved to: openevolve_trials_results.json")
151+
152+
# Clean up trial output directories
153+
for trial in range(num_trials):
154+
output_dir = f"openevolve_output_trial_{trial}"
155+
if os.path.exists(output_dir):
156+
shutil.rmtree(output_dir)
157+
158+
return summary
159+
160+
161+
if __name__ == "__main__":
162+
import argparse
163+
parser = argparse.ArgumentParser()
164+
parser.add_argument("--trials", type=int, default=3, help="Number of trials")
165+
parser.add_argument("--iterations", type=int, default=100, help="Max iterations per trial")
166+
parser.add_argument("--seed", type=int, default=100, help="Base random seed")
167+
args = parser.parse_args()
168+
169+
run_trials(num_trials=args.trials, max_iterations=args.iterations, base_seed=args.seed)

0 commit comments

Comments
 (0)