Version 2.0 | Last Updated: March 01, 2026
This project implements an advanced deep learning model for predicting stock prices using historical market data. The model leverages a hybrid architecture that combines causal convolutional layers, multi‑head self‑attention, residual connections, and layer normalization to effectively capture both short‑term patterns and long‑term dependencies in financial time series. The model is trained on multiple stock tickers and predicts the future closing price based on a window of past observations.
The codebase is written in Python, using TensorFlow/Keras for model construction, scikit‑learn for preprocessing, and pandas for data manipulation. The pipeline includes robust data scaling, sequence generation, training with advanced callbacks, and thorough evaluation (MAE, MSE, R², and optionally MAPE).
- Data Preprocessing: Loads and cleans stock data for multiple tickers; applies per‑ticker MinMax scaling; handles missing values via forward/backward fill.
- Sequence Creation: Builds time‑series sequences with a configurable window size (default 60 days).
- State‑of‑the‑Art Architecture:
- Causal Conv1D layers with residual connections and layer normalization.
- Multi‑Head Self‑Attention for capturing global dependencies.
- Feed‑forward networks with dropout and L2 regularization.
- Global average pooling followed by dense heads.
- Training Pipeline: Automatic device selection (CPU/GPU); callbacks for early stopping, model checkpointing, learning rate reduction, and TensorBoard logging.
- Evaluation: Computes MAE, MSE, and R² on the test set; plots training curves.
- Model Persistence: Saves the final model and per‑ticker scalers for later inference.
- Visualization: Generates training/validation loss and metric plots.
Install the required packages:
pip install pandas numpy tensorflow scikit-learn joblib matplotlibOr use a requirements.txt in InvestingAssistant repo:
pandas>=2.0.0
numpy>=1.24.0
tensorflow>=2.12.0
scikit-learn>=1.2.0
joblib>=1.2.0
matplotlib>=3.7.0
price/
├── combined_stock_data.csv # Input dataset (user‑provided)
├── stock_model.keras # Final trained model
├── best_model.keras # Best checkpoint (by val_loss)
├── stock_scaler.save # Saved MinMaxScaler per ticker
├── training_log.csv # Epoch‑wise training metrics
├── training_metrics.png # Plot of loss & metrics
├── logs/ # TensorBoard logs
└── train_model.py # Main training script
The input CSV must contain the following columns:
| Column | Description | Type |
|---|---|---|
| Date | Date of the observation | datetime |
| Ticker | Stock ticker symbol | string |
| Open | Opening price | float |
| High | Highest price of the day | float |
| Low | Lowest price of the day | float |
| Close | Closing price (prediction target) | float |
| Volume | Trading volume | float |
| Dividends | Dividends paid | float |
| Stock Splits | Stock split ratio | float |
Example:
Date,Ticker,Open,High,Low,Close,Volume,Dividends,Stock Splits
2023-01-01,AAPL,130.28,132.67,129.61,131.86,123456789,0.0,0.0
2023-01-01,MSFT,240.22,243.15,238.75,241.01,987654321,0.0,0.0
...The main script defines several constants at the top of main.py:
| Parameter | Description | Default Value |
|---|---|---|
WINDOW_SIZE |
Number of past days used for prediction | 60 |
EPOCHS |
Maximum number of training epochs | 1000 |
BATCH_SIZE |
Batch size for training | 128 |
These can be adjusted directly in the source file.
-
Place your dataset as
combined_stock_data.csvin the project directory. -
Run the script:
python train_model.py
You will be prompted to choose the device:
Choose device for training (cpu/gpu): -
Outputs:
stock_model.keras– the final trained model.best_model.keras– the best model based on validation loss.stock_scaler.save– a dictionary ofMinMaxScalerobjects for each ticker.training_log.csv– CSV with per‑epoch metrics.training_metrics.png– plot of loss, MAE, and MSE.logs/– TensorBoard logs.
-
Monitor with TensorBoard:
tensorboard --logdir logs/
Then open
http://localhost:6006in your browser.
The model is a custom deep architecture designed for time‑series forecasting. Below is a layer‑by‑layer description:
- Input shape:
(WINDOW_SIZE, num_features)(e.g.,(60, 7)). - GaussianNoise(0.01) – adds small noise to inputs for better generalisation.
Three convolutional blocks, each consisting of:
- Causal Conv1D (filters: 64, 128, 256; kernel sizes: 7, 5, 5; padding='causal').
- LayerNormalization – normalises across the feature dimension (preferred for sequences).
- Dropout(0.2) for regularisation.
- Residual addition: if the number of filters changes, a 1x1 convolution projects the skip connection.
- Activation: ReLU.
- MaxPooling1D(pool_size=2) – reduces temporal dimension after convolutions.
- MultiHeadAttention(num_heads=4, key_dim=128) – attends to the sequence to capture global dependencies.
- Residual connection around the attention layer.
- LayerNormalisation after addition.
- Feed‑forward network: Dense(ff_dim*2) → Dense(original_dim) with ReLU and Dropout(0.3).
- Another residual connection + layer norm.
- GlobalAveragePooling1D – aggregates the sequence into a fixed‑length vector.
- Dense(256, activation='relu', L2=0.001) → BatchNormalisation → Dropout(0.4)
- Dense(128, activation='relu', L2=0.001) → BatchNormalisation → Dropout(0.3)
- Output Dense(1) – predicts the scaled closing price.
- Optimizer: AdamW (learning rate = 1e-3, weight decay = 1e-4)
- Loss: Mean Squared Error (MSE)
- Metrics: MAE, MSE, and MAPE (note: MAPE can be extremely high on scaled data; interpret with caution).
ModelCheckpoint– saves the best model (best_model.keras) based onval_loss.EarlyStopping– stops after 15 epochs without improvement, restores best weights.ReduceLROnPlateau– reduces learning rate by factor 0.5 ifval_lossplateaus for 5 epochs.TensorBoard– logs to./logs/.CSVLogger– writes epoch metrics totraining_log.csv.
After training, the script reports:
- Test Loss (MSE)
- Test MAE
- Test MSE (redundant, kept for clarity)
- R² Score (coefficient of determination)
Typical results (example from a recent run):
Test Loss: 0.0023
Test MAE: 0.0394
Test MSE: 0.0020
Test Accuracy (R² score): 96.74%
Note: The R² score of 96.74% indicates excellent fit on the test data.
After training, you can load the model and scalers to predict future prices for a specific ticker:
import pandas as pd
import numpy as np
import joblib
from tensorflow.keras.models import load_model
# Load model and scalers
model = load_model("stock_model.keras")
scalers = joblib.load("stock_scaler.save")
# Prepare data for a single ticker (e.g., "AAPL")
ticker = "AAPL"
df = pd.read_csv("combined_stock_data.csv")
company_df = df[df["Ticker"] == ticker].copy()
company_df = company_df.drop(columns=["Date", "Ticker"])
numeric_cols = ["Open", "High", "Low", "Close", "Volume", "Dividends", "Stock Splits"]
company_df = company_df[numeric_cols].ffill().bfill()
# Scale the data using the ticker's scaler
scaler = scalers[ticker]
scaled_data = scaler.transform(company_df[numeric_cols])
scaled_df = pd.DataFrame(scaled_data, columns=numeric_cols)
# Create the last window of length WINDOW_SIZE
if len(scaled_df) < WINDOW_SIZE:
raise ValueError("Not enough data to form a sequence")
sequence = scaled_df.iloc[-WINDOW_SIZE:].values # shape: (WINDOW_SIZE, num_features)
# Add batch dimension
sequence = np.expand_dims(sequence, axis=0) # shape: (1, WINDOW_SIZE, num_features)
# Predict
pred_scaled = model.predict(sequence)[0, 0]
# Inverse transform to get actual price
# Create a dummy row to invert only the 'Close' column
dummy = np.zeros((1, len(numeric_cols)))
dummy[0, numeric_cols.index("Close")] = pred_scaled
pred_actual = scaler.inverse_transform(dummy)[0, numeric_cols.index("Close")]
print(f"Predicted closing price for {ticker}: ${pred_actual:.2f}")- MAPE on scaled data: The Mean Absolute Percentage Error can become astronomically large when the true value is close to zero (because of the division by a small number). Ignore MAPE during training if you normalise your targets to [0,1]. For business interpretation, compute MAPE after inverse‑transforming predictions.
- Data quality: The model assumes clean, complete data. Forward/backward fill is used, but extreme outliers may still affect performance.
- Window size: The default 60 days is a reasonable starting point; you may experiment with 30, 90, or 120 days.
- Feature engineering: Consider adding technical indicators (moving averages, RSI, MACD) to improve predictive power.
- Overfitting: Despite heavy regularisation, financial time series are notoriously noisy. Always validate on out‑of‑time data.
- GPU memory: If you encounter out‑of‑memory errors, reduce
BATCH_SIZEor the number of filters in the convolutional layers.
- TensorFlow/Keras: https://www.tensorflow.org/
- scikit‑learn: https://scikit-learn.org/
- Pandas: https://pandas.pydata.org/
- Matplotlib: https://matplotlib.org/
For questions or contributions, please open an issue in the project repository.