---
title: Customer Churn Prediction
emoji: 📉
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.37.0"
python_version: "3.10"
app_file: app.py
pinned: false
---

📉 Customer Churn Prediction

An end-to-end Machine Learning project that predicts whether a telecom customer is likely to churn based on usage patterns, subscription details, and customer profile data. The project includes data preprocessing, feature engineering, model training, explainability using SHAP, and deployment using Streamlit.

🚀 Live Demo

link:

📌 Problem Statement

Telecom companies lose revenue when customers stop using their services (churn).
The goal of this project is to:

Predict whether a customer will churn or stay, so that businesses can take proactive retention actions.

📊 Dataset

We use the IBM Telco Customer Churn Dataset.

It contains information such as:

Customer demographics
Subscription services
Contract type
Payment methods
Monthly & total charges
Churn status

Target variable:

Churn Value (1 = Churn, 0 = No Churn)
🧠 Machine Learning Workflow

The project follows a complete ML pipeline:

1. Data Preprocessing
Removed irrelevant columns (CustomerID, location data, etc.)
Handled missing values
Cleaned dataset

2. Feature Engineering
Created Tenure Groups:
New
Regular
Loyal
Very Loyal
Encoded categorical variables using One-Hot Encoding

3. Handling Class Imbalance
Used class_weight='balanced'
Used scale_pos_weight for XGBoost

4. Model Training

We trained and compared:

Logistic Regression
Random Forest
XGBoost

5. Evaluation Metrics
Accuracy
Precision
Recall
F1 Score
ROC-AUC

6. Explainability (SHAP)
Identified important features affecting churn
Provided model interpretability

7. Deployment
Built interactive web app using Streamlit
Real-time churn prediction system

🏆 Model Performance
Model	Accuracy	Precision	Recall	F1 Score	ROC-AUC
Logistic Regression	0.737	0.503	0.773	0.610	0.843
Random Forest	0.793	0.640	0.500	0.562	0.840
XGBoost	0.769	0.553	0.687	0.613	0.833

✅ Best Model:

Logistic Regression (based on highest Recall & ROC-AUC)

🔍 Key Insights from EDA
Customers with short tenure are more likely to churn
Month-to-month contracts have the highest churn rate
Higher monthly charges increase churn probability
Customers without support/security services churn more
📈 SHAP Explainability

SHAP analysis revealed the most important features:

Contract type
Tenure months
Monthly charges
Internet service type

These features strongly influence churn behavior.

🖥️ Streamlit App Features

The deployed app allows users to:

Enter customer details
Get churn probability in real-time
View risk level (High / Low)
Interactive UI for easy testing
🛠️ Tech Stack
Python 🐍
Pandas & NumPy
Scikit-learn
XGBoost
SHAP
Streamlit
Matplotlib & Seaborn

📂 Project Structure
customer-churn-project/
│
├── app.py
├── churn_model.pkl
├── model_columns.pkl
├── requirements.txt
├── Telco-Customer-Churn.csv
│
├── notebooks/
│   └── eda_and_modeling.ipynb
│
└── README.md

⚙️ Installation & Setup
1. Clone Repository
git clone https://github.com/your-username/churn-prediction.git
cd churn-prediction
2. Install Dependencies
pip install -r requirements.txt
3. Run Streamlit App
streamlit run app.py
📦 Requirements
streamlit
pandas
numpy
scikit-learn
xgboost
joblib
matplotlib
seaborn
shap

📌 Business Impact

This system helps telecom companies to:

Identify at-risk customers early
Reduce customer churn rate
Improve retention strategies
Increase revenue stability
📊 Future Improvements
Hyperparameter tuning (Optuna / GridSearch)
Deep learning model comparison
API deployment using FastAPI
Dashboard with Power BI / Tableau
Automated retraining pipeline
👨‍💻 Author

Mohd Faizanullah
Machine Learning Enthusiast | AI Developer

⭐ If you like this project

Give this repo a ⭐ and connect for more ML projects!