An end-to-end unsupervised learning project that applies K-Means clustering to a tabular dataset and deploys an interactive Streamlit demo for exploring clusters, tuning parameters, and visualizing results.
Live Demo: https://clustering-segmentation-explorer-nadine-shill.streamlit.app/
Repo: https://github.com/nadine-ramirez/unsupervised-clustering-kmeans
Notebook: https://github.com/nadine-ramirez/unsupervised-clustering-kmeans/blob/main/Clustering.ipynb
Clustering is a core unsupervised learning technique used to discover structure in data when labels aren’t available. This project focuses on a practical workflow:
- Load and preprocess structured data
- Fit a K-Means model
- Evaluate cluster quality (e.g., inertia / silhouette score)
- Visualize clusters (2D via PCA)
- Deploy a Streamlit app to explore results interactively
The Streamlit app (app.py) lets you:
- Explore a built-in sample dataset (Iris) or use your own dataset (optional if implemented)
- Select clustering features
- Choose the number of clusters (K) with a slider
- View:
- Cluster assignments
- PCA visualization of clusters
- Summary statistics by cluster
- Basic clustering metrics (inertia, silhouette score)
- Export results as a CSV (optional if implemented)
- Preprocessing
- Numeric feature selection
- Scaling (if enabled)
- Model
- K-Means clustering using scikit-learn
- Dimensionality Reduction
- PCA to project features into 2D for visualization
- Outputs
- Cluster labels added to the dataset
- Visual + metric feedback to help choose a good K