Mini GPT: A PyTorch Implementation

📖 Project Overview

This repository provides a modular, educational implementation of a Generative Pre-trained Transformer (GPT) model using PyTorch. Designed for clarity and extensibility, it serves as a practical resource for understanding the internal mechanics of Large Language Models (LLMs), including Self-Attention mechanisms, Transformer blocks, and autoregressive text generation.

The codebase is structured to facilitate step-by-step learning, isolating critical components such as data processing, model architecture, and training logic into distinct modules.

📂 Repository Structure

The project follows a modular design pattern to separate concerns and improve maintainability.

.
├── hyperparameters.py      # Configuration center for model and training parameters
├── data_handling.py        # Data ingestion, tokenization, and batch generation pipeline
├── model_architecture.py   # Core GPT model definition (Transformer, Attention, FeedForward)
├── train_model.py          # Training orchestration script
├── generate_text.py        # Inference script for autoregressive text generation
├── input.txt               # (Optional) Raw text corpus for training
└── README.md               # Project documentation

🏗️ Architecture Walkthrough

To gain a comprehensive understanding of the system, we recommend reviewing the modules in the following logical sequence:

1. Configuration Layer (`hyperparameters.py`)

Defines the global constants and hyperparameters that control the model's capacity and training dynamics.

Key Parameters: batch_size, block_size (context window), n_embd (embedding dimension), n_head, n_layer.

2. Data Pipeline (`data_handling.py`)

Implements the ETL (Extract, Transform, Load) logic for textual data.

Tokenization: Character-level mapping (converting characters to integer indices).
Batching: Generates (input, target) pairs for supervised learning, ensuring efficient GPU utilization.

3. Core Architecture (`model_architecture.py`)

Encapsulates the mathematical definition of the GPT model.

TinyGPT: The main container class.
Block: A single Transformer block containing LayerNorm, Multi-Head Attention, and Feed-Forward networks.
MultiHeadAttention: The mechanism allowing the model to attend to different parts of the sequence simultaneously.

4. Training Engine (`train_model.py`)

Orchestrates the optimization process.

Optimization: Uses AdamW optimizer.
Loop: Performs forward pass, loss calculation (Cross-Entropy), backward pass, and parameter updates.
Checkpointing: Saves model weights to tiny_gpt_model.pth.

5. Inference Engine (`generate_text.py`)

Demonstrates the model's generative capabilities.

Sampling: Uses the trained model to predict tokens autoregressively.
Decoding: Converts predicted token indices back into human-readable text.

🚀 Getting Started

Prerequisites

Python 3.8+
Package Manager: uv (recommended) or pip

Installation

Initialize the environment and install dependencies:

uv sync
# OR
pip install torch numpy tqdm

Usage

1. Train the Model

Execute the training script to optimize the model parameters on the provided corpus.

python train_model.py

Output: Training logs showing loss reduction over epochs.

2. Generate Text

Run the inference script to generate text sequences using the trained weights.

python generate_text.py

📜 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
data_handling.py		data_handling.py
generate_text.py		generate_text.py
hyperparameters.py		hyperparameters.py
input.txt		input.txt
model_architecture.py		model_architecture.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tiny_gpt_model.pth		tiny_gpt_model.pth
train_model.py		train_model.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini GPT: A PyTorch Implementation

📖 Project Overview

📂 Repository Structure

🏗️ Architecture Walkthrough

1. Configuration Layer (`hyperparameters.py`)

2. Data Pipeline (`data_handling.py`)

3. Core Architecture (`model_architecture.py`)

4. Training Engine (`train_model.py`)

5. Inference Engine (`generate_text.py`)

🚀 Getting Started

Prerequisites

Installation

Usage

1. Train the Model

2. Generate Text

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mini GPT: A PyTorch Implementation

📖 Project Overview

📂 Repository Structure

🏗️ Architecture Walkthrough

1. Configuration Layer (hyperparameters.py)

2. Data Pipeline (data_handling.py)

3. Core Architecture (model_architecture.py)

4. Training Engine (train_model.py)

5. Inference Engine (generate_text.py)

🚀 Getting Started

Prerequisites

Installation

Usage

1. Train the Model

2. Generate Text

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Configuration Layer (`hyperparameters.py`)

2. Data Pipeline (`data_handling.py`)

3. Core Architecture (`model_architecture.py`)

4. Training Engine (`train_model.py`)

5. Inference Engine (`generate_text.py`)

Packages