SpiritVALS (Vendor Analytics for Liquor Sales)

This data analytics project analyzes vendor performance in liquor sales using an end-to-end workflow. Raw CSV files are ingested from Google Drive and transformed into a streamlined SQLite data mart (~10k rows from ~15M). Exploratory data analysis (EDA) and vendor-level analytics are performed in Python to deliver actionable insights for procurement and sales optimization.

📌 Project Goals

Transform raw transactional data into an analysis‑ready SQLite database.
Create a compact summary table optimized for fast iteration and visual analysis.
Perform EDA to uncover vendor performance patterns and sales drivers.
Lay groundwork for an interactive Power BI dashboard and a stakeholder‑friendly report.

🧱 Repository Structure

notebooks/
    ├── EDA.ipynb
    ├── notebook.ipynb
    └── vendor_performance_analysis.ipynb
.gitignore
get_vendor_summary.py
ingestion_db.py
LICENSE
README.md
requirements.txt

What’s where

ingestion_db.py – Extracts raw data and loads it into SQLite using SQL DDL/DML.
get_vendor_summary.py – Builds a 10k‑row vendor summary (from ~15M rows) for analytics.
notebooks/ – Jupyter notebooks for EDA, visualizations, and vendor insights.
requirements.txt – Python dependencies.

🗂️ Data Source

Drive link (raw data): Google Drive file
Expected content: transactional liquor sales with fields such as date, vendor, product/brand, quantity, price, outlet/region, etc. (schema inferred during ingestion).

⚠️ The raw file is large; the pipeline creates a compact SQLite layer for faster analysis.

🔄 Data Pipeline (ETL → Data Mart)

Extract & Load: Download the file from Drive and run ingestion_db.py to create a SQLite DB (inventory.db).
Transform: Use SQL (CTEs/indexes) to normalize types, handle nulls, and add keys.
Summarize: Run get_vendor_summary.py to aggregate ~15M rows into ~10k rows (vendor‑day/month/brand metrics).
Analyze & Visualize: Open notebooks to perform EDA and vendor performance analysis.

⚙️ Quickstart

# 1) Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# 2) Install dependencies
pip install -r requirements.txt

# 3) Place/download the raw file locally (from the Drive link)
""" 
e.g., data/
    ├── begin_inventory.csv
    ├── end_inventory.csv
    ├── purchase_prices.csv
    ├── purchases.csv
    ├── sales.csv
    └── vendor_invoice.csv
"""

# 4) Build the SQLite database
python ingestion_db.py 

# 5) Create the vendor summary table for analytics
python get_vendor_summary.py 

# 6) Explore notebooks
#    notebooks/EDA.ipynb, notebooks/vendor_performance_analysis.ipynb

🧪 EDA & Analytics (Highlights)

Brands with Low Sales but High Margins: Identified brands that may benefit from targeted promotions or pricing adjustments.
Top Vendors & Brands by Sales: Ranked and visualized the leading vendors and brands based on total sales.
Vendor Contribution Analysis: Top 10 vendors account for a major share of total procurement, as shown by Pareto and donut charts.
Bulk Purchasing Impact: Large orders secure the lowest unit prices, confirming significant cost savings.
Unsold Inventory & Turnover: Highlighted vendors with excess stock and calculated capital locked in unsold

Notebooks:

EDA.ipynb – data sanity checks, profiling, core distributions.
vendor_performance_analysis.ipynb – KPI build‑out, vendor rankings, trend diagnostics.
notebook.ipynb – scratchpad/experiments supporting the final analysis.

📊 Future Scope

Power BI dashboard: interactive pages for Vendor Overview, Brand Mix, Geography, and Seasonality.
Stakeholder report: concise, narrative‐led write‑up for senior management.
Automation: schedule ingestion & summary table refresh (e.g., cron/GitHub Actions) and CI data tests (e.g., Great Expectations).

🛠️ Tech Stack

Python: pandas, numpy, matplotlib, seaborn, sqlalchemy
Database: SQLite (SQL DDL/DML, indexes for performance)
Environment: Jupyter notebooks

✅ Reproducibility & Performance Notes

The ~10k‑row summary enables fast iteration vs. scanning ~15M rows.
SQLite indices and typed columns materially improve query time.
Random seeds used in any sampling steps (if applicable) are fixed for reproducibility.

📄 License

This project is released under the terms of the LICENSE file in this repository.

🙋 Contact

Author: Mudassir Ansari
Role: Final‑year Computer Engineering student • Data/ML enthusiast
Reach: Open to feedback, internships, and collaboration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpiritVALS (Vendor Analytics for Liquor Sales)

📌 Project Goals

🧱 Repository Structure

🗂️ Data Source

🔄 Data Pipeline (ETL → Data Mart)

⚙️ Quickstart

🧪 EDA & Analytics (Highlights)

📊 Future Scope

🛠️ Tech Stack

✅ Reproducibility & Performance Notes

📄 License

🙋 Contact

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_vendor_summary.py		get_vendor_summary.py
ingestion_db.py		ingestion_db.py
requirements.txt		requirements.txt

License

Mudassir-A/Vendor-Performance-Analysis

Folders and files

Latest commit

History

Repository files navigation

SpiritVALS (Vendor Analytics for Liquor Sales)

📌 Project Goals

🧱 Repository Structure

🗂️ Data Source

🔄 Data Pipeline (ETL → Data Mart)

⚙️ Quickstart

🧪 EDA & Analytics (Highlights)

📊 Future Scope

🛠️ Tech Stack

✅ Reproducibility & Performance Notes

📄 License

🙋 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages