Skip to content

AmirhosseinHonardoust/Subscription-Loyalty-Risk-Radar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Subscription Loyalty Risk Radar

An intelligence engine that transforms raw shopping behavior into subscription insights, frequency predictions, loyalty scoring, and scenario simulation, designed for teams that want to understand not only what customers do, but why they behave the way they do.


Overview

Retail loyalty is not a single action, it is a behavioral signature that emerges from repeated decisions: purchasing rhythms, shipping preferences, discount sensitivity, past experiences, and long-term commitment tendencies.

Yet most companies reduce loyalty to naive metrics like β€œnumber of purchases” or β€œsubscription status.” This leads to simplistic marketing decisions and predictable churn.

Subscription-Loyalty-Risk-Radar takes a more scientific view:

  • Loyalty is multi-dimensional
  • Behavior must be quantified
  • Predictions must be explainable
  • Insights must be actionable

This project builds a full-stack ML system that:

1. Predicts subscription probability

Who is likely to subscribe? Who is unlikely? Why?

2. Models purchase frequency

How often will a customer buy? What is their behavioral β€œintensity score”?

3. Creates a unified Loyalty Risk Score (0–100)

A single interpretable metric combining short-term behavior + long-term intent.

4. Provides explainability for each score

Which features raised or lowered loyalty? What factors shape behavior?

5. Simulates what-if scenarios

What happens if you offer a discount? Change shipping speed? Add a promo?

6. Visualizes everything in an interactive dashboard

A complete customer intelligence interface powered by Streamlit.


Why This Project Exists

(A business narrative + a data science narrative)

The Business Problem

E-commerce teams struggle with questions like:

  • β€œWhich customers are slipping away?”
  • β€œWho should we target with retention offers?”
  • β€œWhich segments are discount-driven?”
  • β€œWhat would increase subscription adoption?”
  • β€œWho buys weekly vs monthly vs annually, and why?”

And crucially:

β€œWhich levers actually change customer behavior?” (not which ones we think do)

Traditional dashboards fail because they answer what happened, but not what will happen or why it will happen.

This project fills that gap.


The Data Science Problem

Most ML pipelines try to predict a single target. But loyalty is not a single target, it is the interaction of at least two dimensions:

1. Long-term commitment signals β†’ subscription intention

This reflects trust, brand fit, and willingness to commit.

2. Short-term behavioral intensity β†’ purchase frequency

This reflects habits, timing, product needs, lifestyle cycles.

These two dimensions do not always correlate, which is why a single model is insufficient.

A customer may:

  • Buy frequently but never subscribe
  • Buy rarely but have high subscription tendency
  • Buy seasonally yet be highly loyal
  • Buy many times but be price-sensitive and churn-prone

To model loyalty correctly, we must model:

  • Intent
  • Behavior
  • Consistency
  • Sensitivity
  • Predictability

This system captures all of them.


How a Data Scientist Thinks About Loyalty

(Core design philosophy)

Loyalty is not an outcome, it is an evolving probability distribution.

We build models not to label customers but to approximate their latent state.

Prediction is only step 1, interpretation is step 2.

A high churn score is meaningless unless we know the reason.

The system must generate strategy.

Knowing someone is β€œat risk” is not enough. We need to answer:

  • What lever would improve their loyalty?
  • What scenario reduces their risk most?
  • How does discount sensitivity differ across personas?

Human + Machine collaboration

This tool is not meant to replace analysts, it amplifies them.


System Architecture

Below is a conceptual high-level diagram (not code-specific):

        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚        Raw Shopping Dataset        β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚      Data Cleaning & Normalization β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   Feature Engineering & Encoding   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β–Ό                           β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   Subscription Model   β”‚    β”‚   Frequency Regression   β”‚
 β”‚ (Binary Classification)β”‚    β”‚ (Ordinal Behavior Score) β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                           β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚       Loyalty Scoring Engine       β”‚
        β”‚ (combine probability + frequency)  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚      Streamlit Intelligence UI     β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Dataset Signals Used

The models leverage a mixture of:

Demographics

  • Age
  • Gender
  • Location

Purchasing Behavior

  • Purchase amount
  • Previous purchases
  • Frequency of purchases (target for frequency model)

Experience Signals

  • Review rating
  • Shipping type
  • Discount use
  • Promo code use

Product Preference

  • Category
  • Item purchased
  • Color
  • Size
  • Season

Together, these features reflect both identity and behavior, crucial for modeling loyalty.


Modeling Strategy

Subscription Model

Question: β€œIf we removed friction, how likely is this customer to subscribe?”

Why Random Forest?

  • Handles non-linear relationships (β€œyoung + winter + clothing discount = subscriber”)
  • Robust to noise
  • Performs well with mixed categorical + numeric data
  • Avoids overfitting with minimal tuning

What the model learns:

  • Customers who buy frequently trend toward subscribing
  • Promo usage may indicate value sensitivity
  • Shipping preference indicates tolerance for speed vs. cost
  • Location interacts with seasonality
  • Certain product categories correlate with subscription behavior

Frequency Model

Question: β€œHow strong is this customer’s purchasing rhythm?”

The target is treated as an ordinal variable, converted to an intensity scale (1–7).

Why a regressor (instead of classification)?

Because:

  • The distance between categories matters
  • Weekly β‰  Fortnightly β‰  Monthly
  • Regression treats the output as a continuum
  • Allows subtle differences between customers

It essentially measures habit strength.


Loyalty Scoring Engine

We model loyalty as:

Loyalty = Intent (60%) + Behavior (40%)

Why?

  • Subscription intention reflects commitment
  • Frequency score reflects habit strength

Both matter, but intention is slightly more predictive long-term.

Then we compute:

Loyalty Index (0–1 scale)

loyalty_index = 
      0.6 * p_subscribe 
    + 0.4 * (frequency_score / 7)

Loyalty Risk (0–100 scale)

loyalty_risk = (1 - loyalty_index) * 100

High risk means:

  • Low frequency + low subscription probability
  • Inconsistent or seasonal buying pattern
  • Price-sensitivity with low commitment
  • Weak habit + friction sensitivity

Segment Intelligence (Why This Matters)

Segment-level insights reveal patterns like:

  • Winter clothing buyers may be high-frequency but low-subscriber
  • Cash users may have sporadic behavior
  • Express shipping demand might correlate with loyalty
  • Promo-heavy shoppers may churn if discounts stop

These insights guide:

  • Marketing personalization
  • Pricing strategy
  • Retention campaigns
  • Seasonal promotions
  • Subscription product design

Scenario Simulation

This is one of the most powerful features.

You can modify a customer’s attributes to answer:

β€œIf I change X, what would happen to loyalty?”

Examples:

  • Change shipping from β€œStandard” β†’ β€œExpress”
  • Toggle β€œDiscount Applied: Yes β†’ No”
  • Add a promo code
  • Switch payment method

The system recomputes:

  • New subscription probability
  • New frequency score
  • New loyalty risk
  • And shows the delta for each metric

This helps teams test strategies before deploying them.


Explainability

Marketing and product teams care about:

  • β€œWhy did the model say this customer is at risk?”
  • β€œWhat drives loyalty in this segment?”

Explainability provides:

Global feature importance

What factors matter most overall?

Local (per-customer) explanations

Which features increased or decreased:

  • Intent
  • Frequency
  • Loyalty

This turns predictions into stories:

  • β€œThis customer buys weekly but rarely uses discounts, high loyalty.”
  • β€œThis customer buys only in winter and always uses promos, seasonal but price-sensitive.”
  • β€œThis customer prefers express shipping and leaves high reviews, strong subscription potential.”

Now the model is not a black box. It is a diagnostic tool.


Quickstart

pip install -r requirements.txt
python -m src.cli prepare-data
python -m src.cli train-all
python -m src.cli evaluate
python -m src.cli score-customers --output data/processed/scored.parquet
streamlit run app/app.py

Future Enhancements

Machine Learning

  • Replace RandomForest with LightGBM for better performance
  • Hyperparameter optimization (Optuna)
  • Add ordinal regression for frequency
  • Add seasonally aware models

Analytics

  • Persona clustering (KMeans + PCA/UMAP)
  • Retention funnel modeling
  • Abandonment probability model
  • Price elasticity modeling

Dashboard UX

  • Animated cohort transitions
  • Customer β€œjourney cards”
  • Auto-generated retention recommendations

Engineering

  • FastAPI backend for scoring
  • Docker containerization
  • Full cloud deployment
  • Automated monitoring + drift detection

Final Thoughts

Subscription-Loyalty-Risk-Radar is more than an ML pipeline. It is a framework for understanding customer behavior, built with:

  • Mathematical clarity
  • Business intuition
  • System-level thinking
  • Explainability
  • Actionability

It shows how a data scientist:

  • Designs multi-model systems
  • Thinks about latent customer states
  • Blends prediction with reasoning
  • Turns algorithms into decisions
  • Makes machine learning useful

This is not just a model, it is a loyalty intelligence engine.

About

A customer intelligence engine that predicts subscription probability, models purchase frequency, and computes a unified loyalty risk score. Includes explainability, segment insights, and a scenario simulator, all integrated into an interactive Streamlit dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages