Skip to content

nadine-ramirez/clustering-segmentation-explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised Clustering — K-Means + Streamlit Live Demo

An end-to-end unsupervised learning project that applies K-Means clustering to a tabular dataset and deploys an interactive Streamlit demo for exploring clusters, tuning parameters, and visualizing results.

Live Demo: https://clustering-segmentation-explorer-nadine-shill.streamlit.app/
Repo: https://github.com/nadine-ramirez/unsupervised-clustering-kmeans
Notebook: https://github.com/nadine-ramirez/unsupervised-clustering-kmeans/blob/main/Clustering.ipynb


Overview

Clustering is a core unsupervised learning technique used to discover structure in data when labels aren’t available. This project focuses on a practical workflow:

  • Load and preprocess structured data
  • Fit a K-Means model
  • Evaluate cluster quality (e.g., inertia / silhouette score)
  • Visualize clusters (2D via PCA)
  • Deploy a Streamlit app to explore results interactively

Streamlit App Features

The Streamlit app (app.py) lets you:

  • Explore a built-in sample dataset (Iris) or use your own dataset (optional if implemented)
  • Select clustering features
  • Choose the number of clusters (K) with a slider
  • View:
    • Cluster assignments
    • PCA visualization of clusters
    • Summary statistics by cluster
    • Basic clustering metrics (inertia, silhouette score)
  • Export results as a CSV (optional if implemented)

How It Works (High Level)

  1. Preprocessing
    • Numeric feature selection
    • Scaling (if enabled)
  2. Model
    • K-Means clustering using scikit-learn
  3. Dimensionality Reduction
    • PCA to project features into 2D for visualization
  4. Outputs
    • Cluster labels added to the dataset
    • Visual + metric feedback to help choose a good K

Project Structure