A newer version of the Streamlit SDK is available: 1.58.0
title: Customer Churn Prediction
emoji: π
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.37.0
python_version: '3.10'
app_file: app.py
pinned: false
π Customer Churn Prediction
An end-to-end Machine Learning project that predicts whether a telecom customer is likely to churn based on usage patterns, subscription details, and customer profile data. The project includes data preprocessing, feature engineering, model training, explainability using SHAP, and deployment using Streamlit.
π Live Demo
link:
π Problem Statement
Telecom companies lose revenue when customers stop using their services (churn). The goal of this project is to:
Predict whether a customer will churn or stay, so that businesses can take proactive retention actions.
π Dataset
We use the IBM Telco Customer Churn Dataset.
It contains information such as:
Customer demographics Subscription services Contract type Payment methods Monthly & total charges Churn status
Target variable:
Churn Value (1 = Churn, 0 = No Churn) π§ Machine Learning Workflow
The project follows a complete ML pipeline:
Data Preprocessing Removed irrelevant columns (CustomerID, location data, etc.) Handled missing values Cleaned dataset
Feature Engineering Created Tenure Groups: New Regular Loyal Very Loyal Encoded categorical variables using One-Hot Encoding
Handling Class Imbalance Used class_weight='balanced' Used scale_pos_weight for XGBoost
Model Training
We trained and compared:
Logistic Regression Random Forest XGBoost
Evaluation Metrics Accuracy Precision Recall F1 Score ROC-AUC
Explainability (SHAP) Identified important features affecting churn Provided model interpretability
Deployment Built interactive web app using Streamlit Real-time churn prediction system
π Model Performance Model Accuracy Precision Recall F1 Score ROC-AUC Logistic Regression 0.737 0.503 0.773 0.610 0.843 Random Forest 0.793 0.640 0.500 0.562 0.840 XGBoost 0.769 0.553 0.687 0.613 0.833
β Best Model:
Logistic Regression (based on highest Recall & ROC-AUC)
π Key Insights from EDA Customers with short tenure are more likely to churn Month-to-month contracts have the highest churn rate Higher monthly charges increase churn probability Customers without support/security services churn more π SHAP Explainability
SHAP analysis revealed the most important features:
Contract type Tenure months Monthly charges Internet service type
These features strongly influence churn behavior.
π₯οΈ Streamlit App Features
The deployed app allows users to:
Enter customer details Get churn probability in real-time View risk level (High / Low) Interactive UI for easy testing π οΈ Tech Stack Python π Pandas & NumPy Scikit-learn XGBoost SHAP Streamlit Matplotlib & Seaborn
π Project Structure customer-churn-project/ β βββ app.py βββ churn_model.pkl βββ model_columns.pkl βββ requirements.txt βββ Telco-Customer-Churn.csv β βββ notebooks/ β βββ eda_and_modeling.ipynb β βββ README.md
βοΈ Installation & Setup
- Clone Repository git clone https://github.com/your-username/churn-prediction.git cd churn-prediction
- Install Dependencies pip install -r requirements.txt
- Run Streamlit App streamlit run app.py π¦ Requirements streamlit pandas numpy scikit-learn xgboost joblib matplotlib seaborn shap
π Business Impact
This system helps telecom companies to:
Identify at-risk customers early Reduce customer churn rate Improve retention strategies Increase revenue stability π Future Improvements Hyperparameter tuning (Optuna / GridSearch) Deep learning model comparison API deployment using FastAPI Dashboard with Power BI / Tableau Automated retraining pipeline π¨βπ» Author
Mohd Faizanullah Machine Learning Enthusiast | AI Developer
β If you like this project
Give this repo a β and connect for more ML projects!