Predix/predix.py at master · TPTBusiness/Predix · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/usr/bin/env python
"""
Predix CLI - Wrapper for rdagent with LLM model selection.

Usage:
    predix quant                    # Local llama.cpp (default)
    predix quant --model local      # Explicit local
    predix quant --model openrouter # OpenRouter cloud model
    predix quant -d                 # With web dashboard
"""
import os
import sys
from pathlib import Path

from dotenv import load_dotenv
load_dotenv(Path(__file__).parent / ".env")

import typer
from rich.console import Console

app = typer.Typer(help="Predix - AI Quantitative Trading Agent")
console = Console()


@app.command()
def quant(
    model: str = typer.Option(
        "local",
        "--model", "-m",
        help="LLM backend: 'local' (llama.cpp) or 'openrouter' (cloud)",
    ),
    dashboard: bool = typer.Option(
        False,
        "--dashboard/-d",
        help="Start web dashboard",
    ),
    cli_dashboard: bool = typer.Option(
        False,
        "--cli-dashboard/-c",
        help="Start CLI dashboard",
    ),
    log_file: str = typer.Option(
        None,  # None means auto-detect based on run_id
        "--log-file",
        help="Log file path (default: auto-detected). Use 'none' to disable.",
    ),
    step_n: int = typer.Option(None, help="Number of steps to run"),
    loop_n: int = typer.Option(None, help="Number of loops to run"),
    run_id: int = typer.Option(
        0,
        "--run-id",
        help="Parallel run ID (for isolated results). 0 = single run mode.",
    ),
):
    """
    Start EUR/USD quantitative trading loop with LLM-powered factor generation.

    Executes the RD-Agent quantitative trading loop that uses large language models
    to generate, test, and iterate on alpha factors for EUR/USD trading. Supports
    both local llama.cpp inference and cloud-based OpenRouter models. Results are
    automatically logged and stored in the results directory.

    Args:
        model: LLM backend to use. 'local' for llama.cpp (requires local server
            running on OPENAI_API_BASE), 'openrouter' for cloud API. (default: "local")
        dashboard: If True, starts the Flask-based web dashboard on port 5000
            for real-time monitoring of the trading loop. (default: False)
        cli_dashboard: If True, starts the Rich-based CLI dashboard with a 3-second
            refresh interval for terminal-based monitoring. (default: False)
        log_file: Path for the log file. If None, auto-detects based on run_id
            (e.g., 'fin_quant.log' or 'fin_quant_run1.log'). Use 'none' to disable.
        step_n: Number of individual steps to execute within the loop. None means
            use the default from configuration.
        loop_n: Number of complete loops to run. Each loop generates and evaluates
            new alpha factors. None means use the default from configuration.
        run_id: Parallel run identifier for isolated execution. When > 0, creates
            separate log files, results directories, and workspace directories.
            0 = single run mode (default: 0)

    Examples:
        $ predix quant                          # Local llama.cpp, single run
        $ predix quant -m openrouter            # OpenRouter cloud model
        $ predix quant -d                       # With web dashboard on :5000
        $ predix quant -m openrouter -d         # Cloud model + web dashboard
        $ predix quant --run-id 1               # Parallel run #1 (isolated)
        $ predix quant --run-id 2 --loop-n 50   # Parallel run #2, 50 loops
        $ predix quant --log-file custom.log    # Custom log file path

    Expected Output:
        - Generated alpha factors saved to results/factors/ as JSON files
        - Backtest results stored in results/db/backtest_results.db
        - Log file created in project root (e.g., fin_quant.log)
        - Optional: Web dashboard at http://localhost:5000

    Estimated Time:
        ~5-15 minutes per loop depending on model and data size.
        Local models are faster but may have lower quality than cloud models.

    See Also:
        predix evaluate - Evaluate existing factors with full 1min data
        predix top - Show top-performing factors by IC or Sharpe
        predix health - Check system health and configuration
    """
    import subprocess
    import threading
    import time
    import sys

    # ---- Parallel Run Isolation ----
    # When run_id > 0, isolate all outputs (logs, results, workspace)
    if run_id > 0:
        os.environ["PARALLEL_RUN_ID"] = str(run_id)
        console.print(f"\n[bold yellow]🔀 Parallel Run Mode:[/bold yellow] [cyan]ID={run_id}[/cyan]")

        # Auto-detect log file for parallel run
        if log_file is None:
            log_file = f"fin_quant_run{run_id}.log"

        # Isolate results directories
        results_base = Path(__file__).parent / "results" / "runs" / f"run{run_id}"
        results_base.mkdir(parents=True, exist_ok=True)

        # Isolate workspace directory
        workspace_dir = Path(__file__).parent / f"RD-Agent_workspace_run{run_id}"
        os.environ["RD_AGENT_WORKSPACE"] = str(workspace_dir)

        console.print(f"   [dim]Log: {log_file}[/dim]")
        console.print(f"   [dim]Results: results/runs/run{run_id}/[/dim]")
        console.print(f"   [dim]Workspace: {workspace_dir.name}/[/dim]")
    else:
        # Single run mode: default log file
        if log_file is None:
            log_file = "fin_quant.log"

    # ---- Log File Setup ----
    if log_file.lower() != "none":
        log_path = Path(__file__).parent / log_file
        log_path.parent.mkdir(parents=True, exist_ok=True)

        # Open log file for appending
        log_f = open(log_path, "a", encoding="utf-8")

        # Redirect stdout and stderr to both console and log file
        class TeeWriter:
            def __init__(self, *streams):
                self._streams = streams

            def write(self, data):
                for s in self._streams:
                    try:
                        s.write(data)
                        s.flush()
                    except:
                        pass

            def flush(self):
                for s in self._streams:
                    try:
                        s.flush()
                    except:
                        pass

        sys.stdout = TeeWriter(sys.__stdout__, log_f)
        sys.stderr = TeeWriter(sys.__stderr__, log_f)

        console.print(f"\n[dim]📝 Logging to: {log_path}[/dim]")
    else:
        console.print("\n[dim]⚠️  Logging disabled (console only)[/dim]")

    # ---- LLM Model Selection ----
    if model == "openrouter":
        api_key = os.getenv("OPENROUTER_API_KEY", "")
        api_key_2 = os.getenv("OPENROUTER_API_KEY_2", "")
        if not api_key:
            console.print("\n[bold red]❌ OPENROUTER_API_KEY not set in .env[/bold red]")
            console.print("[yellow]Add your API key to .env:[/yellow]")
            console.print('  OPENROUTER_API_KEY=sk-or-your-key-here')
            raise typer.Exit(code=1)

        # Setup both API keys for load balancing
        os.environ["OPENAI_API_BASE"] = "https://openrouter.ai/api/v1"
        os.environ["CHAT_MODEL"] = os.getenv("OPENROUTER_MODEL", "openrouter/google/gemma-4-26b-a4b-it:free")

        # If second key exists, configure LiteLLM for load balancing
        if api_key_2:
            os.environ["OPENAI_API_KEY"] = f"{api_key},{api_key_2}"
            os.environ["LITELLM_PARALLEL_CALLS"] = "2"
            console.print(f"\n[bold blue]🌐 Using OpenRouter (2 API Keys):[/bold blue] [cyan]{os.environ['CHAT_MODEL']}[/cyan]")
            console.print(f"   [dim]Keys: {api_key[:15]}*** + {api_key_2[:15]}***[/dim]")
            console.print(f"   [dim]Parallel: 2 concurrent requests[/dim]")
        else:
            os.environ["OPENAI_API_KEY"] = api_key
            console.print(f"\n[bold blue]🌐 Using OpenRouter:[/bold blue] [cyan]{os.environ['CHAT_MODEL']}[/cyan]")
            console.print(f"   [dim]Key: {api_key[:15]}***[/dim]")
    elif model == "local":
        os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "local")
        os.environ["OPENAI_API_BASE"] = os.getenv("OPENAI_API_BASE", "http://localhost:8081/v1")
        os.environ["CHAT_MODEL"] = os.getenv("CHAT_MODEL", "openai/qwen3.5-35b")

        console.print(f"\n[bold green]🏠 Using local LLM:[/bold green] [cyan]{os.environ['CHAT_MODEL']}[/cyan]")
        console.print(f"   [dim]Base: {os.environ['OPENAI_API_BASE']}[/dim]")
    else:
        console.print(f"\n[yellow]⚠️  Unknown model: '{model}'. Using .env settings.[/yellow]")

    # ---- Dashboards ----
    if dashboard:
        def start_web_dashboard():
            console.print(f"\n[bold green]🚀 Web Dashboard: http://localhost:5000[/bold green]")
            subprocess.run(
                ["python", "web/dashboard_api.py"],
                cwd=str(Path(__file__).parent),
                env={**os.environ, "FLASK_ENV": "development"},
            )

        threading.Thread(target=start_web_dashboard, daemon=True).start()
        time.sleep(2)

    if cli_dashboard:
        def start_cli_dash():
            from rdagent.log.ui.predix_dashboard import run_dashboard
            run_dashboard(log_path="fin_quant.log", refresh_interval=3)

        threading.Thread(target=start_cli_dash, daemon=True).start()
        time.sleep(1)

    # ---- Start fin_quant ----
    from rdagent.app.qlib_rd_loop.quant import main as fin_quant

    console.print(f"\n[bold cyan]📊 Starting EURUSD Trading Loop...[/bold cyan]\n")

    fin_quant(
        step_n=step_n,
        loop_n=loop_n,
    )


@app.command()
def evaluate(
    top: int = typer.Option(
        100,
        "--top", "-n",
        help="Number of factors to evaluate (default: 100)",
    ),
    all_factors: bool = typer.Option(
        False,
        "--all", "-a",
        help="Evaluate all undiscovered factors",
    ),
    parallel: int = typer.Option(
        4,
        "--parallel", "-p",
        help="Number of parallel workers (default: 4)",
    ),
    force: bool = typer.Option(
        False,
        "--force", "-f",
        help="Force re-evaluation of ALL factors (even already evaluated)",
    ),
):
    """
    Evaluate existing alpha factors with full 1-minute intraday data (2020-2026).

    Computes comprehensive performance metrics including Information Coefficient (IC),
    Sharpe Ratio, Maximum Drawdown, and Win Rate for each factor. Factors are loaded
    from JSON files in results/factors/ and executed against historical data to produce
    out-of-sample performance estimates. Already evaluated factors are automatically
    skipped unless --force is specified.

    Args:
        top: Number of unevaluated factors to process. Only applies when --all is
            not set. Higher values increase total runtime linearly. (default: 100)
        all_factors: If True, evaluates ALL unevaluated factors in the factors
            directory, ignoring the --top parameter. Use with caution as this
            may take hours for large factor sets. (default: False)
        parallel: Number of parallel worker processes for factor evaluation.
            Higher values speed up evaluation but increase memory usage.
            Recommended: 4-8 for most systems. (default: 4)
        force: If True, re-evaluates ALL factors including those that already
            have valid results. Useful when underlying data has changed or
            when recalculating with updated methodology. (default: False)

    Examples:
        $ predix evaluate                   # Evaluate 100 NEW factors
        $ predix evaluate --top 500         # Evaluate 500 NEW factors
        $ predix evaluate --all             # Evaluate all remaining factors
        $ predix evaluate --force --top 50  # Re-evaluate 50 factors
        $ predix evaluate -p 8              # Use 8 parallel workers

    Expected Output:
        - Updated JSON files in results/factors/ with IC, Sharpe, Max DD, Win Rate
        - Summary statistics printed to console
        - Factors with errors are logged and skipped gracefully

    Estimated Time:
        ~2-10 minutes per factor depending on complexity and data size.
        With --parallel 4, expect ~30-60 seconds per factor wall-clock time.

    See Also:
        predix top - Show top-performing factors by IC or Sharpe
        predix portfolio - Select a diversified portfolio of uncorrelated factors
        predix quant - Generate new factors via LLM trading loop
    """
    from rich.panel import Panel

    console.print(Panel(
        "[bold cyan]📊 Predix Factor Evaluator[/bold cyan]\n"
        "Evaluating factors with FULL 1min data (2020-2026)\n"
        "Skips already evaluated factors automatically",
        border_style="cyan",
    ))

    # Import and run the evaluator
    from predix_full_eval import main as eval_main

    try:
        eval_main(
            top=top,
            all_factors=all_factors,
            parallel=parallel,
            force=force,
        )
    except KeyboardInterrupt:
        console.print("\n[yellow]Evaluation interrupted by user[/yellow]")
    except Exception as e:
        console.print(f"\n[bold red]Evaluation failed: {e}[/bold red]")
        import traceback
        console.print(traceback.format_exc())


@app.command()
def top(
    n: int = typer.Option(
        20,
        "--num", "-n",
        help="Number of top factors to show (default: 20)",
    ),
    metric: str = typer.Option(
        "ic",
        "--metric", "-m",
        help="Sort by metric: 'ic' or 'sharpe'",
    ),
):
    """
    Display top-performing alpha factors ranked by IC or Sharpe ratio.

    Loads all evaluated factor results from results/factors/ and presents them
    in a formatted table sorted by the chosen metric. Only factors with valid
    IC values (status='success') are included. This is useful for quickly
    identifying the most promising factors before building portfolios or strategies.

    Args:
        n: Number of top factors to display. Shows fewer if fewer exist in
            the results directory. (default: 20)
        metric: Sorting metric for ranking factors. 'ic' sorts by absolute
            Information Coefficient, 'sharpe' sorts by absolute Sharpe Ratio.
            IC measures predictive power, Sharpe measures risk-adjusted returns.
            (default: "ic")

    Examples:
        $ predix top                      # Top 20 factors by absolute IC
        $ predix top -n 50                # Top 50 factors by absolute IC
        $ predix top -m sharpe            # Top 20 factors by absolute Sharpe
        $ predix top -n 100 -m sharpe     # Top 100 factors by Sharpe

    Expected Output:
        - Formatted table showing Factor name, IC, Sharpe, Annualized Return,
          Max Drawdown, and Win Rate for each factor
        - Summary panel with average and best IC/Sharpe across all factors

    Estimated Time:
        Nearly instantaneous (< 1 second) for typical factor counts.
        May take a few seconds with thousands of factor files.

    See Also:
        predix evaluate - Evaluate factors to generate performance metrics
        predix portfolio - Select diversified portfolio from top factors
        predix build-strategies - Combine factors into trading strategies
    """
    import json
    import glob as glob_module
    import numpy as np
    from rich.table import Table
    from rich.panel import Panel

    factors_dir = Path(__file__).parent / "results" / "factors"
    if not factors_dir.exists():
        console.print("[red]No results found in results/factors/[/red]")
        return

    # Load all factor JSON files
    results = []
    for f in glob_module.glob(str(factors_dir / "*.json")):
        try:
            with open(f) as fh:
                data = json.load(fh)
            # Only include factors with valid IC
            if data.get("status") == "success" and data.get("ic") is not None:
                results.append(data)
        except Exception:
            continue

    if not results:
        console.print("[yellow]No evaluated factors found with valid IC[/yellow]")
        return

    # Sort by metric
    if metric == "sharpe":
        results.sort(key=lambda x: abs(x.get("sharpe", 0) or 0), reverse=True)
        sort_label = "Sharpe"
    else:
        results.sort(key=lambda x: abs(x.get("ic", 0) or 0), reverse=True)
        sort_label = "IC"

    # Display as table
    table = Table(
        title=f"Top {min(n, len(results))} Factors by {sort_label}",
        show_header=True,
        header_style="bold cyan",
    )
    table.add_column("#", justify="center", width=4)
    table.add_column("Factor", width=40)
    table.add_column("IC", justify="right", width=10)
    table.add_column("Sharpe", justify="right", width=10)
    table.add_column("Ann. Return %", justify="right", width=12)
    table.add_column("Max DD", justify="right", width=10)
    table.add_column("Win Rate", justify="right", width=10)

    for i, r in enumerate(results[:n], 1):
        ic = r.get("ic")
        sharpe = r.get("sharpe")
        ann_ret = r.get("annualized_return")
        max_dd = r.get("max_drawdown")
        win_rate = r.get("win_rate")

        table.add_row(
            str(i),
            r["factor_name"][:38],
            f"{ic:.6f}" if ic is not None else "N/A",
            f"{sharpe:.4f}" if sharpe is not None else "N/A",
            f"{ann_ret:.4f}" if ann_ret is not None else "N/A",
            f"{max_dd:.4f}" if max_dd is not None else "N/A",
            f"{win_rate:.2%}" if win_rate is not None else "N/A",
        )

    console.print(table)

    # Summary
    valid_ic = [r.get("ic") for r in results if r.get("ic") is not None]
    valid_sharpe = [r.get("sharpe") for r in results if r.get("sharpe") is not None]
    # Filter extreme outliers for average
    valid_sharpe_filtered = [s for s in valid_sharpe if abs(s or 0) < 1e6]

    console.print(Panel(
        f"[bold]Summary[/bold]\n"
        f"Total evaluated: {len(results)}\n"
        f"Avg IC: {np.mean(valid_ic):.6f} (n={len(valid_ic)})\n"
        f"Best IC: {max(valid_ic, key=abs, default=0):.6f}\n"
        f"Avg Sharpe: {np.mean(valid_sharpe_filtered):.4f} (n={len(valid_sharpe_filtered)})\n"
        f"Best Sharpe: {max(valid_sharpe, key=abs, default=0):.4f}",
        border_style="green",
    ))


@app.command()
def portfolio(
    top: int = typer.Option(
        50,
        "--top", "-n",
        help="Number of candidate factors to consider (default: 50)",
    ),
    target: int = typer.Option(
        10,
        "--target", "-t",
        help="Number of factors to select (default: 10)",
    ),
    max_corr: float = typer.Option(
        0.3,
        "--max-corr", "-c",
        help="Maximum allowed correlation between factors (default: 0.3)",
    ),
):
    """
    Select a diversified portfolio of uncorrelated alpha factors.

    Analyzes the top factors by IC and selects a subset that minimizes redundancy
    by calculating the correlation matrix of factor values. Uses a greedy selection
    algorithm that prioritizes high-IC factors while ensuring pairwise correlations
    stay below the specified threshold. This reduces overfitting risk and creates
    more robust composite signals.

    Args:
        top: Number of candidate factors to consider for portfolio construction.
            Factors are pre-selected by absolute IC before correlation analysis.
            Higher values provide more diversity but increase computation time.
            (default: 50)
        target: Number of factors to include in the final portfolio. The algorithm
            will attempt to select this many uncorrelated factors from the candidate
            pool. May return fewer if insufficient uncorrelated factors exist.
            (default: 10)
        max_corr: Maximum allowed absolute correlation between any two selected
            factors. Lower values produce more diverse portfolios but may exclude
            high-IC factors. Typical range: 0.2-0.5. (default: 0.3)

    Examples:
        $ predix portfolio                   # Select top 10 from top 50 candidates
        $ predix portfolio -n 100 -t 20      # Select top 20 from top 100
        $ predix portfolio -c 0.5            # Allow higher correlation (0.5)
        $ predix portfolio -n 200 -t 15 -c 0.2  # Strict diversification

    Expected Output:
        - Formatted table showing selected factors with IC, Sharpe, and max correlation
        - Portfolio saved to results/portfolio/selected_factors.json
        - Summary of skipped factors and errors (if any)

    Estimated Time:
        ~2-10 minutes depending on candidate count.
        Each factor must be re-evaluated to compute time-series values for correlation.

    See Also:
        predix portfolio-simple - Faster category-based diversification
        predix top - View top factors before portfolio selection
        predix build-strategies - Build strategies from selected factors
    """
    import json
    import glob as glob_module
    import subprocess
    import tempfile
    import shutil
    import numpy as np
    import pandas as pd
    from rich.table import Table
    from rich.panel import Panel
    from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn, TimeElapsedColumn

    factors_dir = Path(__file__).parent / "results" / "factors"
    if not factors_dir.exists():
        console.print("[red]No results found in results/factors/[/red]")
        return

    # 1. Load top factors by IC
    results = []
    for f in glob_module.glob(str(factors_dir / "*.json")):
        try:
            with open(f) as fh:
                data = json.load(fh)
            if data.get("status") == "success" and data.get("ic") is not None:
                results.append(data)
        except Exception:
            continue

    if not results:
        console.print("[red]No evaluated factors found with valid IC[/red]")
        return

    # Sort and select candidates
    results.sort(key=lambda x: abs(x.get("ic", 0) or 0), reverse=True)
    candidates = results[:top]

    console.print(f"Loaded {len(results)} factors. Selecting top {top} candidates...")

    # 2. Evaluate candidates to get time-series values for correlation
    # We need to run the factor code to get the series of values.
    # We do this sequentially to avoid OOM.

    # Locate data file
    data_file = Path(__file__).parent / "git_ignore_folder" / "factor_implementation_source_data" / "intraday_pv.h5"
    if not data_file.exists():
        data_file = Path(__file__).parent / "git_ignore_folder" / "factor_implementation_source_data_debug" / "intraday_pv.h5"

    if not data_file.exists():
        console.print("[red]Source data file (intraday_pv.h5) not found.[/red]")
        return

    factor_series = {} # name -> pd.Series
    errors = []

    with Progress(
        SpinnerColumn(),
        TextColumn("[progress.description]{task.description}"),
        BarColumn(),
        TaskProgressColumn(),
        TimeElapsedColumn(),
        console=console,
    ) as progress:
        task = progress.add_task(f"Computing values for {len(candidates)} factors...", total=len(candidates))

        for cand in candidates:
            fname = cand.get("factor_name", "unknown")
            fcode = cand.get("factor_code", "")

            if not fcode:
                errors.append((fname, "No code in JSON"))
                progress.advance(task)
                continue

            # Create temp workspace
            with tempfile.TemporaryDirectory() as tmpdir:
                tmp_path = Path(tmpdir)
                # Symlink data
                try:
                    os.symlink(str(data_file), str(tmp_path / "intraday_pv.h5"))
                except OSError:
                    # If symlink fails, copy the file
                    import shutil
                    shutil.copy(str(data_file), str(tmp_path / "intraday_pv.h5"))

                # Write code
                (tmp_path / "factor.py").write_text(fcode)

                try:
                    # Run factor
                    result = subprocess.run(
                        [sys.executable, "factor.py"],
                        cwd=tmp_path,
                        capture_output=True,
                        text=True,
                        timeout=120 # 2 min timeout per factor
                    )

                    # Read result
                    res_file = tmp_path / "result.h5"
                    if res_file.exists():
                        df = pd.read_hdf(str(res_file), key="data")
                        # Get the series (first column)
                        series = df.iloc[:, 0]

                        # Count non-NaN values
                        non_nan = series.count()
                        if non_nan < 1000:
                            errors.append((fname, f"Only {non_nan} valid values"))
                            progress.update(task, description=f"{fname}: {non_nan} values ⚠️")
                        else:
                            factor_series[fname] = series
                            progress.update(task, description=f"Computed {fname} ✅ ({non_nan} values)")
                    else:
                        # Check stderr for errors
                        stderr = result.stderr[:200] if result.stderr else "Unknown"
                        errors.append((fname, f"No result.h5. Error: {stderr}"))
                        progress.update(task, description=f"{fname} ❌ (No result)")
                except subprocess.TimeoutExpired:
                    errors.append((fname, "Timeout (2 min)"))
                    progress.update(task, description=f"{fname} ⏱️ (Timeout)")
                except Exception as e:
                    errors.append((fname, str(e)[:100]))
                    progress.update(task, description=f"{fname} ❌ (Error)")

            progress.advance(task)

    # Show summary of errors
    if errors:
        console.print(f"\n[yellow]Skipped {len(errors)} factors:[/yellow]")
        for fname, reason in errors[:5]:
            console.print(f"  • {fname}: {reason}")
        if len(errors) > 5:
            console.print(f"  ... and {len(errors)-5} more")

    if len(factor_series) < 3:
        console.print("[red]Not enough valid factor series to build portfolio (need at least 3).[/red]")
        console.print("[yellow]Tip: Factors might be producing mostly NaN values or failing execution.[/yellow]")

        # Fallback: Show top factors by IC without diversification
        console.print("\n[dim]Showing top factors by IC instead:[/dim]")
        table = Table(
            title=f"Top {min(20, len(candidates))} Factors by IC (No Diversification)",
            show_header=True,
            header_style="bold cyan",
        )
        table.add_column("#", justify="center", width=4)
        table.add_column("Factor", width=40)
        table.add_column("IC", justify="right", width=10)
        table.add_column("Sharpe", justify="right", width=10)

        for i, cand in enumerate(candidates[:20], 1):
            table.add_row(
                str(i),
                cand.get("factor_name", "unknown")[:38],
                f"{cand.get('ic', 0):.6f}",
                f"{cand.get('sharpe', 0):.4f}" if cand.get('sharpe') else "N/A",
            )

        console.print(table)
        return

    # 3. Build Correlation Matrix
    console.print(f"\n[dim]Building correlation matrix from {len(factor_series)} factors...[/dim]")

    # Align indices and drop NaN
    combined = pd.DataFrame(factor_series).dropna()

    if combined.empty or len(combined) < 100:
        console.print("[red]Not enough valid overlapping data to compute correlation.[/red]")
        console.print("[dim]This means the factors produce values at different times or have too many NaN values.[/dim]")
        return

    corr_matrix = combined.corr().fillna(0)
    ic_map = {cand['factor_name']: cand.get('ic', 0) for cand in candidates}

    # 4. Greedy Selection
    selected = []
    remaining = list(corr_matrix.columns)

    # Sort remaining by IC to prioritize high IC factors
    remaining.sort(key=lambda x: abs(ic_map.get(x, 0)), reverse=True)

    for factor in remaining:
        if len(selected) >= target:
            break

        # If it's the first one, just take it
        if not selected:
            selected.append(factor)
            continue

        # Check correlation with already selected
        # We want max(|corr|) < max_corr
        max_c = 0
        for sel in selected:
            c = abs(corr_matrix.loc[factor, sel])
            if c > max_c:
                max_c = c

        if max_c < max_corr:
            selected.append(factor)

    # 5. Display Results
    table = Table(
        title=f"Selected Diversified Portfolio (Top {len(selected)})",
        show_header=True,
        header_style="bold cyan",
    )
    table.add_column("#", justify="center", width=4)
    table.add_column("Factor", width=40)
    table.add_column("IC", justify="right", width=10)
    table.add_column("Sharpe", justify="right", width=10)
    table.add_column("Max Corr", justify="right", width=10)

    for i, fname in enumerate(selected, 1):
        # Find original data for display
        data = next((c for c in candidates if c['factor_name'] == fname), {})
        ic = data.get('ic')
        sharpe = data.get('sharpe')

        # Calculate max corr with other selected factors
        max_c_val = 0
        for s in selected:
            if s != fname:
                val = abs(corr_matrix.loc[fname, s])
                if val > max_c_val: max_c_val = val

        table.add_row(
            str(i),
            fname[:38],
            f"{ic:.6f}" if ic is not None else "N/A",
            f"{sharpe:.4f}" if sharpe is not None else "N/A",
            f"{max_c_val:.4f}" if max_c_val > 0 else "-"
        )

    console.print(table)

    # 6. Save Result
    portfolio_data = {
        "selected_factors": selected,
        "max_correlation": max_corr,
        "pool_size": top,
        "timestamp": pd.Timestamp.now().isoformat()
    }

    out_dir = Path(__file__).parent / "results" / "portfolio"
    out_dir.mkdir(parents=True, exist_ok=True)
    out_file = out_dir / "selected_factors.json"

    with open(out_file, "w") as f:
        json.dump(portfolio_data, f, indent=2)

    console.print(Panel(
        f"[bold]Portfolio saved to results/portfolio/selected_factors.json[/bold]\n"
        f"Selected {len(selected)} unique factors from {top} candidates.",
        border_style="green"
    ))


@app.command()
def portfolio_simple(
    top: int = typer.Option(
        100,
        "--top", "-n",
        help="Number of candidate factors to consider (default: 100)",
    ),
):
    """
    Select a diversified portfolio using keyword-based category grouping (fast method).

    Instead of computing expensive correlation matrices, this method groups factors
    by their names into categories (momentum, volatility, mean_reversion, session,
    volume, pattern) and selects the highest-IC factor from each category. This
    provides a quick approximation of diversification without re-evaluating factors.
    Falls back to 'other' category for factors that don't match any keywords.

    Args:
        top: Number of candidate factors to consider before categorization.
            Factors are pre-selected by absolute IC. Higher values increase
            the chance of finding factors in all categories. (default: 100)

    Examples:
        $ predix portfolio-simple              # Top factors from different categories
        $ predix portfolio-simple -n 200       # Consider top 200 factors
        $ predix portfolio-simple -n 50        # Quick selection from top 50

    Expected Output:
        - Formatted table showing selected factors with their category, IC, and Sharpe
        - Portfolio saved to results/portfolio/portfolio_simple.json
        - Categories include: Momentum, Volatility, Mean Reversion, Session,
          Volume, Pattern, and Other

    Estimated Time:
        Nearly instantaneous (< 1 second). No factor re-evaluation required.
        Only loads existing JSON results and performs keyword matching.

    See Also:
        predix portfolio - Correlation-based diversification (more accurate but slower)
        predix top - View top factors before portfolio selection
        predix build-strategies - Build strategies from selected factors
    """
    import json
    import glob as glob_module
    import re
    import numpy as np
    import pandas as pd
    from rich.table import Table
    from rich.panel import Panel

    factors_dir = Path(__file__).parent / "results" / "factors"
    if not factors_dir.exists():
        console.print("[red]No results found in results/factors/[/red]")
        return

    # 1. Load top factors by IC
    results = []
    for f in glob_module.glob(str(factors_dir / "*.json")):
        try:
            with open(f) as fh:
                data = json.load(fh)
            if data.get("status") == "success" and data.get("ic") is not None:
                results.append(data)
        except Exception:
            continue

    if not results:
        console.print("[red]No evaluated factors found with valid IC[/red]")
        return

    # Sort by absolute IC
    results.sort(key=lambda x: abs(x.get("ic", 0) or 0), reverse=True)
    candidates = results[:top]

    # 2. Define categories based on keywords in factor names
    categories = {
        "momentum": ["mom", "return", "ret", "trend", "directional", "drift", "slope", "roc"],
        "volatility": ["vol", "std", "range", "dev", "risk", "variance"],
        "mean_reversion": ["ridge", "mean", "reversion", "revert", "resid", "resi", "norm"],
        "session": ["session", "london", "ny", "overlap", "asian", "intraday"],
        "volume": ["vol_", "volume", "flow", "pressure", "toxicity", "imbalance"],
        "pattern": ["pattern", "shape", "structure", "fractal"],
    }

    # 3. Assign each factor to a category
    categorized = {cat: [] for cat in categories}
    categorized["other"] = []

    for cand in candidates:
        fname = cand.get("factor_name", "").lower()
        assigned = False

        # Check each category's keywords
        for cat, keywords in categories.items():
            if any(kw in fname for kw in keywords):
                categorized[cat].append(cand)
                assigned = True
                break

        if not assigned:
            categorized["other"].append(cand)

    # 4. Select best factor from each category
    selected = []
    for cat in list(categories.keys()) + ["other"]:
        if categorized[cat]:
            best = categorized[cat][0]  # Already sorted by IC
            selected.append({
                "factor": best,
                "category": cat.capitalize() if cat != "other" else "Other"
            })

    # 5. Display Results
    table = Table(
        title=f"Simple Diversified Portfolio (Selected {len(selected)} factors)",
        show_header=True,
        header_style="bold cyan",
    )
    table.add_column("#", justify="center", width=4)
    table.add_column("Factor", width=40)
    table.add_column("Category", width=15)
    table.add_column("IC", justify="right", width=10)
    table.add_column("Sharpe", justify="right", width=10)

    for i, item in enumerate(selected, 1):
        cand = item["factor"]
        cat = item["category"]
        table.add_row(
            str(i),
            cand.get("factor_name", "unknown")[:38],
            cat,
            f"{cand.get('ic', 0):.6f}",
            f"{cand.get('sharpe', 0):.4f}" if cand.get('sharpe') else "N/A",
        )

    console.print(table)

    # 6. Save Result
    portfolio_data = {
        "selected_factors": [item["factor"]["factor_name"] for item in selected],
        "categories": {item["category"]: item["factor"]["factor_name"] for item in selected},
        "method": "simple_keyword_categorization",
        "timestamp": str(pd.Timestamp.now().isoformat())
    }

    out_dir = Path(__file__).parent / "results" / "portfolio"
    out_dir.mkdir(parents=True, exist_ok=True)
    out_file = out_dir / "portfolio_simple.json"

    with open(out_file, "w") as f:
        json.dump(portfolio_data, f, indent=2)

    console.print(Panel(
        f"[bold]Simple Portfolio saved to results/portfolio/portfolio_simple.json[/bold]\n"
        f"Selected {len(selected)} factors across {len([c for c in categorized if categorized[c]])} categories.",
        border_style="green"
    ))


@app.command()
def build_strategies(
    top: int = typer.Option(
        50,
        "--top", "-n",
        help="Number of top factors to consider (default: 50)",
    ),
    max_combo: int = typer.Option(
        2,
        "--max-combo", "-c",
        help="Maximum combination size: 2=pairs, 3=triplets (default: 2)",
    ),
    diversified: bool = typer.Option(
        False,
        "--diversified/-d",
        help="Only generate cross-category combinations",
    ),
):
    """
    Build trading strategies by systematically combining alpha factors.

    This command loads top evaluated factors, generates systematic combinations
    (pairs, triplets, etc.), and evaluates each combination using walk-forward
    validation. Results are ranked by Sharpe ratio and the best strategies are
    saved for later use. This is ideal for discovering synergies between factors
    that individually may have modest performance but work well together.

    Args:
        top: Number of top factors (by IC) to use as building blocks for
            strategy combinations. Higher values increase the number of
            combinations exponentially. (default: 50)
        max_combo: Maximum number of factors per combination. 2 creates only
            pairs, 3 creates pairs and triplets, etc. Higher values dramatically
            increase the combination count (n choose k). (default: 2)
        diversified: If True, only generates cross-category combinations,
            ensuring factors come from different groups (momentum, volatility,
            etc.). This reduces redundancy but may miss strong single-category
            strategies. (default: False)

    Examples:
        $ predix build-strategies                   # Build from top 50, pairs only
        $ predix build-strategies -n 100 -c 3       # Top 100, up to triplets
        $ predix build-strategies -d                # Diversified (cross-category) only
        $ predix build-strategies -n 30 -c 2 -d     # Top 30, diversified pairs

    Expected Output:
        - Formatted table of top strategies ranked by Sharpe ratio
        - Strategy files saved to results/strategies/
        - Summary with total combinations, success rate, avg/best Sharpe

    Estimated Time:
        ~1-5 minutes for pairs, ~10-30 minutes for triplets.
        Scales with O(n^k) where n=factors, k=max_combo_size.

    See Also:
        predix build-strategies-ai - AI-powered strategy generation via LLM
        predix portfolio - Select diversified factors before combining
        predix top - View top factors before building strategies
    """
    import pandas as pd