Spaces:

kmsmohamedansar
/

high-value-customer-predictor

Sleeping

App Files Files Community

kmsmohamedansar commited on Jul 20, 2025

Commit

354439d

verified ·

1 Parent(s): 64db51f

Upload 6 files

Browse files

Files changed (6) hide show

README.md +48 -9
app.py +64 -0
le_region.pkl +3 -0
le_segment.pkl +3 -0
model.pkl +3 -0
requirements.txt +6 -0

README.md CHANGED Viewed

@@ -1,12 +1,51 @@
 ---
-title: High-Value Customer Predictor
-emoji: 💎
-colorFrom: indigo
-colorTo: blue
-sdk: streamlit
-sdk_version: "1.35.0"
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 📌 High-Value Customer Predictor
+Predict which customers are most likely to drive revenue — using purchase behavior, discount sensitivity, and profitability.
+---
+## 🔍 Problem Statement
+E-commerce businesses want to identify high-value customers to drive targeted retention campaigns and improve profitability.
 ---
+## 🧠 ML Pipeline
+- ✅ Data cleaning + transformation with DuckDB
+- ✅ Feature engineering (recency, frequency, discount behavior, profit margins, tenure)
+- ✅ Binary classification: top 30% monetary value = high-value customer
+- ✅ Random Forest with hyperparameter tuning + SHAP explainability
+- ✅ Model evaluation: ROC, PR, confusion matrix, calibration
+- ✅ Deployment with Streamlit UI + tunnel (localtunnel or ngrok)
+- ✅ Versioned with MLflow and tested via pytest
+---
+## 📊 Example Features
+- recency_days: days since last purchase
+- order_frequency_rate: monthly ordering rate
+- rfm_score: recency-frequency-monetary customer score
+- profit_margin_pct: profit-to-sales ratio
+---
+## 🧪 Model Performance
+Metric     | Score
+-----------|-------
+Accuracy   | 0.97
+F1-score   | 0.95
+ROC AUC    | 0.99
+---
+## 🚀 Try it Out (Local)
+Run this in your terminal:
+    pip install streamlit pandas scikit-learn shap
+    streamlit run app.py
 ---
+## 🛠 Future Improvements
+- Cloud deployment (Render / Hugging Face Spaces)
+- Real-time model monitoring (Prometheus + Grafana)
+- Role-based authentication for Streamlit

app.py ADDED Viewed

	@@ -0,0 +1,64 @@

+import streamlit as st
+import pandas as pd
+import numpy as np
+import pickle
+import shap
+import matplotlib.pyplot as plt
+# Load model and encoders
+with open("model.pkl", "rb") as f:
+    model = pickle.load(f)
+with open("le_region.pkl", "rb") as f:
+    le_region = pickle.load(f)
+with open("le_segment.pkl", "rb") as f:
+    le_segment = pickle.load(f)
+st.set_page_config(page_title="💎 High-Value Customer Predictor")
+st.title("💎 High-Value Customer Predictor")
+st.markdown("Enter customer details below to predict if they are high-value.")
+# Layout inputs in two columns
+col1, col2 = st.columns(2)
+with col1:
+    recency_days = st.number_input("📅 Recency (days since last purchase)", min_value=0, value=30)
+    frequency = st.number_input("🔁 Frequency (number of orders)", min_value=1, value=5)
+    monetary_value = st.number_input("💰 Monetary Value (total sales)", min_value=0.0, value=1000.0)
+    avg_order_value = st.number_input("🛒 Average Order Value", min_value=0.0, value=200.0)
+with col2:
+    total_profit = st.number_input("📈 Total Profit", min_value=0.0, value=100.0)
+    avg_days_between_orders = st.number_input("⏳ Avg Days Between Orders", min_value=0.0, value=30.0)
+    region = st.selectbox("📍 Region", le_region.classes_)
+    segment = st.selectbox("👤 Segment", le_segment.classes_)
+# Encode categorical inputs
+region_enc = le_region.transform([region])[0]
+segment_enc = le_segment.transform([segment])[0]
+input_data = pd.DataFrame([[
+    recency_days, frequency, monetary_value,
+    avg_order_value, total_profit, avg_days_between_orders,
+    region_enc, segment_enc
+]], columns=[
+    'recency_days', 'frequency', 'monetary_value',
+    'avg_order_value', 'total_profit', 'avg_days_between_orders',
+    'region_enc', 'segment_enc'
+])
+if st.button("🚀 Predict"):
+    pred = model.predict(input_data)[0]
+    proba = model.predict_proba(input_data)[0][1]
+    if pred == 1:
+        st.success(f"✅ Predicted HIGH VALUE with {proba:.2%} confidence.")
+    else:
+        st.info(f"ℹ️ Predicted NOT high value ({proba:.2%} confidence).")
+    # SHAP explanation
+    explainer = shap.Explainer(model)
+    shap_values = explainer(input_data)
+    st.subheader("🔍 Feature Contribution (SHAP)")
+    fig, ax = plt.subplots()
+    shap.plots.waterfall(shap_values[0], max_display=8, show=False)
+    st.pyplot(fig)

le_region.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d46f55b78a83417be6cf81cf6a6bed3269d03394d78235b25e79cb08eebc604
+size 275

le_segment.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e63bd88dc4eb0ccd09167bfa798b8cd023aa886635dd2fe5b5a99df037a45a4e
+size 280

model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7b495865cb41d5ec3d268dad51ba9acf62a9204fc32bdf8fc37e441c59f5094b
+size 1013510

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+streamlit
+scikit-learn
+pandas
+shap
+matplotlib
+pickle-mixin