A production-ready paper trading system using PPO (Proximal Policy Optimization) with automated fine-tuning every 2 hours and integrated SHAP + LIME explainability.
Core Algorithm: PPO-based Deep Reinforcement Learning agent trained on DOW 30 stocks
Explainability: SHAP and LIME provide feature importance and trade explanations (optional)
Auto Fine-Tuning: Model retrains every 2 hours using recent market data with validation-based rollback
Trading: Real-time 1-minute interval trading via Alpaca API with turbulence-based risk management
Portfolio Performance:
- Initial Capital: $1,000,000
- Final Value: $1,223,734
- Total Return: 22.5%
- Sharpe Ratio: 2.318
- Peak Portfolio Value: $1.26M
- Maximum Drawdown: -6.05%
Risk Metrics:
- Volatility: 8.20%
- Best Day: +2.48%
- Worst Day: -2.43%
Fine-Tuning Performance:
- Total Cycles: 271
- Accepted Models: 141 (52.0%)
- Rejected Models: 130 (48.0%)
- Average Improvement: 1.53% per accepted fine-tune
9-Panel Performance Dashboard:
- Portfolio Value: Grew from $1M to $1.26M peak, ended at $1.22M
- Cumulative Returns: Peaked at 27%, closed at 22.5%
- Drawdown: Controlled risk with max -6% drawdown
- Daily Returns: Normal distribution centered at 0
- Cash vs Holdings: Maintained ~$1.2M in positions
- Fine-tuning Decisions: 52% acceptance rate shows validation working
- Score Improvement: Linear correlation between original and fine-tuned
- Acceptance Rate: Rolling average around 52% over time
PPO vs Other RL Algorithms - No finetuning (3 month backtesting):
- PPO (this system): $1.150M (+15.1%) - Best performer
- TD3: $1.092M (+9.3%)
- DJI Index: $1.086M (+8.6%)
- MVO: $1.076M (+7.6%)
Portfolio Value with Fine-tuning Decisions:
- Green dots: Model accepted (141 times)
- Red dots: Model rejected (130 times)
- Shows validation system preventing bad model updates
- Accepted models correlate with portfolio growth periods
- Rejected models protect against degradation
PPO Agent
- State space: 301 features (cash + prices + holdings + 8 technical indicators × 30 stocks)
- Action space: 30 continuous actions (one per stock, range [-1, +1])
- Trained model with continuous learning
Auto Fine-Tuning (Every 2 Hours)
- Loads last 48 hours of trading data
- Retrains PPO model with recent market conditions
- Validates on held-out data (20% split)
- Rolls back if performance degrades below 95% threshold
- Real Performance: 52% acceptance rate, 1.53% avg improvement per accepted model
SHAP + LIME Explainability (Optional)
- SHAP: Global feature importance (which indicators matter most)
- LIME: Local explanations (why this specific trade was made)
- Toggle on/off with
--no-explainflag
Risk Management
- Turbulence detection: Liquidates positions when turbulence > 500
- Position limits: Max 100 shares per stock
- Transaction costs: 0.1% per trade
- Auto square-off: Closes all positions 15 min before market close
Data Pipeline
- Real-time 1-minute OHLCV data from Alpaca
- 8 technical indicators per stock (MACD, RSI, CCI, DX, SMAs)
- Continuous CSV logging for model retraining
# Clone repository
git clone https://github.com/ayushraj09/TradingAgent.git
cd TradingAgent
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate .env file with your Alpaca credentials:
ALPACA_API_KEY=your_key
ALPACA_API_SECRET=your_secret
ALPACA_API_BASE_URL=https://paper-api.alpaca.marketsEnsure you have a trained PPO model at trained_models/agent_ppo.zip
With SHAP + LIME explanations:
python main.pyWithout explanations (faster):
python main.py --no-explainCustom configuration:
python main.py --interval 5 --finetune-interval 4 --output my_results/- Fetch State: Get current prices, holdings, technical indicators (301 features)
- PPO Prediction: Model outputs 30 actions (buy/sell signals)
- Explainability (if enabled): SHAP/LIME analyze decision
- Execute Trades: Submit orders to Alpaca API (threaded)
- Log Results: Save portfolio value, trades, explanations
- Check Fine-Tuning: If 2 hours elapsed, trigger retraining
- Load last 48 hours of trading data from CSV
- Split into 80% train, 20% validation
- Evaluate current model on validation set
- Fine-tune PPO with 2000 steps (LR: 1e-5)
- Evaluate fine-tuned model on same validation set
- Accept if new model ≥ 95% of original performance
- Reject and rollback if performance degrades
- Retrain SHAP/LIME explainers on updated model
SHAP - Shows global importance across all decisions:
Top 5 Features:
1. AAPL_macd: 0.34 (momentum indicator)
2. MSFT_rsi_30: 0.28 (overbought/oversold)
3. JPM_cci_30: -0.21 (cyclical indicator)
...
LIME - Explains individual trades:
BUY 45 AAPL @ $175.23
Reasons:
- AAPL_macd: +0.15 (positive momentum)
- AAPL_close_30_sma: +0.12 (uptrend)
- market_turbulence: -0.05 (low volatility)
python main.py [OPTIONS]| Option | Default | Description |
|---|---|---|
--no-explain |
False | Disable SHAP/LIME (15% faster) |
--model PATH |
trained_models/agent_ppo.zip |
Path to PPO model |
--output DIR |
production_paper_trading_results |
Output directory |
--interval N |
1 | Trading interval (minutes) |
--finetune-interval N |
2 | Fine-tuning interval (hours) |
Examples:
# Standard run with explainability
python main.py
# Fast mode (no SHAP/LIME)
python main.py --no-explain
# Trade every 5 minutes, fine-tune every 4 hours
python main.py --interval 5 --finetune-interval 4
# Use custom model and output directory
python main.py --model my_model.zip --output results_jan11/- macd - Moving Average Convergence Divergence
- boll_ub - Bollinger Upper Band
- boll_lb - Bollinger Lower Band
- rsi_30 - 30-period Relative Strength Index
- cci_30 - 30-period Commodity Channel Index
- dx_30 - 30-period Directional Movement Index
- close_30_sma - 30-period Simple Moving Average
- close_60_sma - 60-period Simple Moving Average
Total: 8 indicators × 30 stocks = 240 technical indicator features
[0] : Cash balance
[1-30] : Current stock prices
[31-60] : Stock holdings (shares owned)
[61-300] : Technical indicators (240)
INITIAL_CASH = 1_000_000
MAX_STOCK = 100 # Max shares per position
TRANSACTION_COST_PCT = 0.001 # 0.1% per trade
TURBULENCE_THRESHOLD = 500 # Risk cutoff
TIME_INTERVAL_MIN = 1 # Trade frequency
FINETUNE_INTERVAL_HOURS = 2 # Auto-retrain frequency
FINETUNE_LOOKBACK_HOURS = 48 # Training data window
FINETUNE_LR = 1e-5 # Fine-tuning learning rate
FINETUNE_STEPS = 2000 # Training steps
ROLLBACK_THRESHOLD = 0.95 # Min performance to acceptTraining Period: July 1, 2024 - July 30, 2025 (13 months)
- Used for initial PPO model training
- Covers diverse market conditions across multiple seasons
- Provides robust foundation for the trading agent
Testing/Evaluation Period: August 1, 2025 - November 5, 2025 (3+ months)
- Live paper trading evaluation period
- Real-time performance validation
- Auto fine-tuning active during this period
- Results: 22.5% total return with 2.318 Sharpe ratio
TradingAgent/
├── main.py # CLI entry point
├── config.py # Configuration & credentials
├── data_manager.py # Data fetching & CSV storage
├── trading_utils.py # Trading helpers & fine-tuning
├── paper_trader.py # Main trading class (PPO + SHAP/LIME)
├── test_setup.py # Setup verification
├── run_paper_trading.sh # Quick start script
│
├── .env # Alpaca API credentials
├── trained_models/ # PPO models
│ └── agent_ppo.zip
│
├── production_paper_trading_results/ # Output
│ ├── trading_history.csv # Portfolio value, trades
│ ├── finetune_history.csv # Fine-tuning metrics
│ └── model_cycle_*.zip # Fine-tuned models
│
├── figs/ # Result visualizations
└── notebooks/ # Development notebooks
trading_history.csv - Trade log
timestamp,cycle,portfolio_value,cash,turbulence,num_trades
2026-01-11 09:45:00,1,1000000.00,850000.00,124.5,15
2026-01-11 09:46:00,2,1001250.50,845000.00,118.2,8finetune_history.csv - Fine-tuning log
timestamp,original_score,finetuned_score,accepted,improvement_pct
2026-01-11 10:30:15,245.67,257.89,True,4.97
2026-01-11 12:30:22,257.89,249.23,True,-3.36production_paper_trading_data.csv - Complete market data
- 1-minute OHLCV for all 30 stocks
- All 240 technical indicators
- Used for fine-tuning
Real-time portfolio:
watch -n 5 'tail -20 production_paper_trading_results/trading_history.csv'Fine-tuning status:
tail -f production_paper_trading_results/finetune_history.csvCurrent portfolio value:
watch -n 10 "tail -1 production_paper_trading_results/trading_history.csv | cut -d',' -f3"Model not found:
# Train a PPO model first by going through the notebooks or specify path
python main.py --model path/to/your/model.zipMissing .env:
# Create .env with Alpaca credentials
cp .env.example .env
# Edit .env with your API keysImport errors:
# Install dependencies
pip install -r requirements.txtTest setup:
python test_setup.py- Python 3.8+
- Alpaca Paper Trading Account (free)
- ~500MB RAM (without explainability)
- ~800MB RAM (with SHAP/LIME)
Dependencies:
stable-baselines3- PPO implementationfinrl- Financial RL frameworkalpaca-trade-api- Market data & executionshap- SHAP explainabilitylime- LIME explainabilitypandas,numpy- Data handling
Proximal Policy Optimization (PPO) is an on-policy RL algorithm that:
- Observes market state (301 features)
- Predicts actions (30 buy/sell signals)
- Executes trades and observes results
- Updates policy to maximize portfolio value
Why PPO?
- Stable training (doesn't diverge)
- Sample efficient (learns from limited data)
- Works well with continuous action spaces
- Proven performance: 22.5% return with 2.318 Sharpe ratio
Fine-Tuning Every 2 Hours:
- Adapts to recent market regime changes
- Uses actual trading experience as training data
- Validation prevents overfitting to noise
- Rollback ensures we don't deploy worse models
- Measured Impact: 52% of fine-tunes accepted, avg +1.53% improvement
Performance vs Other Algorithms:
- Outperforms SAC by 4.5%
- Outperforms TD3 by 10.5%
- Outperforms MVO by 10%
- Beats DJI index by 8.5%
- PPO Paper: Proximal Policy Optimization Algorithms
- SHAP Paper: A Unified Approach to Interpreting Model Predictions
- FinRL: FinRL Library
- FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance: NeurIPS, FinRL Paper
- Alpaca: Alpaca API Docs
MIT License - See LICENSE file for details
For paper trading and educational purposes only. Not financial advice.
Use at your own risk. Past performance does not guarantee future results.


