A from-scratch implementation of feedforward neural networks in pure NumPy, built as part of the Artificial Intelligence fundamentals course at the Università degli Studi di Parma.
The project systematically sweeps network architectures, activation functions, and dropout regularisation on the MNIST handwritten-digit dataset, logging all results to CSV and generating visualisation plots.
- Pure NumPy implementation – no autograd framework, every gradient is hand-derived and backpropagated manually.
- Mini-batch SGD with configurable batch size and learning rate.
- Inverted dropout regularisation.
- Numerical gradient check (finite differences) runs automatically before every sweep to verify backpropagation correctness.
- Structured logging – INFO to console, DEBUG to a rotating log file.
- CSV experiment log – all hyperparameters and per-epoch losses persisted for reproducibility.
- Automatic result analysis – four publication-ready plots generated by
analyzer.py. - Cross-platform: PowerShell (Windows), Make (Unix/macOS), Docker.
graph TD
A[main.py] --> B[src/experiments.py]
B --> C[src/config.py]
B --> D[src/data.py]
B --> E[src/training.py]
B --> F[src/network.py]
F --> G[src/layers.py]
E --> F
H[analyzer.py] --> C
Backpropagation analisys/
├── src/
│ ├── __init__.py # Public re-exports
│ ├── config.py # ExperimentConfig dataclass – all hyperparameters
│ ├── layers.py # DenseLayer, DropoutLayer
│ ├── network.py # NeuralNetwork container
│ ├── data.py # MNIST loading, normalisation, noise injection
│ ├── training.py # train(), evaluate(), gradient_check()
│ └── experiments.py # Sweep orchestration, logging setup, CSV writing
├── main.py # Entry point
├── analyzer.py # Results visualisation
├── requirements.txt
├── Makefile # Unix/macOS convenience targets
├── Dockerfile
├── run.ps1 # Windows PowerShell launcher
├── .gitignore
├── LICENSE
└── CONTRIBUTING.md
- Python 3.11+
pip- (Optional)
make, Docker, or PowerShell 7+
Windows (PowerShell)
.\run.ps1 setupUnix / macOS (Make)
make setupManual (any platform)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\Activate.ps1
pip install -r requirements.txtDocker
docker build -t fondamenti-ia .Windows
.\run.ps1 startUnix / macOS
make runDocker
docker run --rm \
-v $(pwd)/data:/app/data \
-v $(pwd)/results:/app/results \
fondamenti-iaResults are written to results/run_<timestamp>/:
experiment_log.csv– full metrics for every run.experiment.log– detailed debug log.
Windows
.\run.ps1 analyzeUnix / macOS
make analyzeOr point the analyzer at a specific CSV:
python analyzer.py results/run_20250623_142953/experiment_log.csvGenerated plots:
| File | Contents |
|---|---|
grouped_accuracy_by_activation.png |
Test accuracy by activation function |
grouped_accuracy_by_arch.png |
Test accuracy by architecture |
loss_curves.png |
Per-run training loss over epochs |
test_accuracy_per_run.png |
Bar chart: test accuracy per run |
All hyperparameters live in src/config.py as fields of the
ExperimentConfig dataclass. Edit the defaults there before running.
| Parameter | Default | Description |
|---|---|---|
train_limit |
5000 |
Training samples (max 60 000) |
test_limit |
10000 |
Test samples |
val_fraction |
0.2 |
Validation fraction of training set |
noise_rate |
0.2 |
Fraction of training labels corrupted |
epochs |
30 |
Training epochs per run |
learning_rate |
0.05 |
SGD learning rate |
batch_size |
64 |
Mini-batch size |
architectures |
[[64],[128,64],…] |
Hidden-layer sizes to sweep |
activations |
[relu, sigmoid, tanh] |
Activations to sweep |
dropout_rates |
[0.2, 0.5] |
Dropout rates to test |
flowchart LR
Input["Input\n784 features"] --> H1["DenseLayer\n(He init)"]
H1 --> D1["DropoutLayer\n(optional)"]
D1 --> Hn["… hidden layers …"]
Hn --> Out["DenseLayer\nSoftmax"]
Out --> Loss["Cross-Entropy\nLoss"]
Loss --> Back["Backprop\n∂L/∂W stored"]
Back --> Upd["Weight update\nSGD"]
- He initialisation for all hidden layers.
- Softmax + cross-entropy with numerically stable combined gradient.
- Inverted dropout: activations scaled by
1/(1-rate)at train time, no correction needed at inference. - Gradient / weight update decoupled:
backward()stores gradients,update_weights()applies them – enabling gradient checking without corrupting weights.
MIT © 2025 Claudio Bendini