datawizard116's picture
Update README.md
597e8ed verified
---
title: Customer Churn Prediction
emoji: πŸ“‰
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.37.0"
python_version: "3.10"
app_file: app.py
pinned: false
---
πŸ“‰ Customer Churn Prediction
An end-to-end Machine Learning project that predicts whether a telecom customer is likely to churn based on usage patterns, subscription details, and customer profile data. The project includes data preprocessing, feature engineering, model training, explainability using SHAP, and deployment using Streamlit.
πŸš€ Live Demo
link:
πŸ“Œ Problem Statement
Telecom companies lose revenue when customers stop using their services (churn).
The goal of this project is to:
Predict whether a customer will churn or stay, so that businesses can take proactive retention actions.
πŸ“Š Dataset
We use the IBM Telco Customer Churn Dataset.
It contains information such as:
Customer demographics
Subscription services
Contract type
Payment methods
Monthly & total charges
Churn status
Target variable:
Churn Value (1 = Churn, 0 = No Churn)
🧠 Machine Learning Workflow
The project follows a complete ML pipeline:
1. Data Preprocessing
Removed irrelevant columns (CustomerID, location data, etc.)
Handled missing values
Cleaned dataset
2. Feature Engineering
Created Tenure Groups:
New
Regular
Loyal
Very Loyal
Encoded categorical variables using One-Hot Encoding
3. Handling Class Imbalance
Used class_weight='balanced'
Used scale_pos_weight for XGBoost
4. Model Training
We trained and compared:
Logistic Regression
Random Forest
XGBoost
5. Evaluation Metrics
Accuracy
Precision
Recall
F1 Score
ROC-AUC
6. Explainability (SHAP)
Identified important features affecting churn
Provided model interpretability
7. Deployment
Built interactive web app using Streamlit
Real-time churn prediction system
πŸ† Model Performance
Model Accuracy Precision Recall F1 Score ROC-AUC
Logistic Regression 0.737 0.503 0.773 0.610 0.843
Random Forest 0.793 0.640 0.500 0.562 0.840
XGBoost 0.769 0.553 0.687 0.613 0.833
βœ… Best Model:
Logistic Regression (based on highest Recall & ROC-AUC)
πŸ” Key Insights from EDA
Customers with short tenure are more likely to churn
Month-to-month contracts have the highest churn rate
Higher monthly charges increase churn probability
Customers without support/security services churn more
πŸ“ˆ SHAP Explainability
SHAP analysis revealed the most important features:
Contract type
Tenure months
Monthly charges
Internet service type
These features strongly influence churn behavior.
πŸ–₯️ Streamlit App Features
The deployed app allows users to:
Enter customer details
Get churn probability in real-time
View risk level (High / Low)
Interactive UI for easy testing
πŸ› οΈ Tech Stack
Python 🐍
Pandas & NumPy
Scikit-learn
XGBoost
SHAP
Streamlit
Matplotlib & Seaborn
πŸ“‚ Project Structure
customer-churn-project/
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ churn_model.pkl
β”œβ”€β”€ model_columns.pkl
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Telco-Customer-Churn.csv
β”‚
β”œβ”€β”€ notebooks/
β”‚ └── eda_and_modeling.ipynb
β”‚
└── README.md
βš™οΈ Installation & Setup
1. Clone Repository
git clone https://github.com/your-username/churn-prediction.git
cd churn-prediction
2. Install Dependencies
pip install -r requirements.txt
3. Run Streamlit App
streamlit run app.py
πŸ“¦ Requirements
streamlit
pandas
numpy
scikit-learn
xgboost
joblib
matplotlib
seaborn
shap
πŸ“Œ Business Impact
This system helps telecom companies to:
Identify at-risk customers early
Reduce customer churn rate
Improve retention strategies
Increase revenue stability
πŸ“Š Future Improvements
Hyperparameter tuning (Optuna / GridSearch)
Deep learning model comparison
API deployment using FastAPI
Dashboard with Power BI / Tableau
Automated retraining pipeline
πŸ‘¨β€πŸ’» Author
Mohd Faizanullah
Machine Learning Enthusiast | AI Developer
⭐ If you like this project
Give this repo a ⭐ and connect for more ML projects!