datawizard116's picture
Update README.md
597e8ed verified

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade
metadata
title: Customer Churn Prediction
emoji: πŸ“‰
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.37.0
python_version: '3.10'
app_file: app.py
pinned: false

πŸ“‰ Customer Churn Prediction

An end-to-end Machine Learning project that predicts whether a telecom customer is likely to churn based on usage patterns, subscription details, and customer profile data. The project includes data preprocessing, feature engineering, model training, explainability using SHAP, and deployment using Streamlit.

πŸš€ Live Demo

link:

πŸ“Œ Problem Statement

Telecom companies lose revenue when customers stop using their services (churn). The goal of this project is to:

Predict whether a customer will churn or stay, so that businesses can take proactive retention actions.

πŸ“Š Dataset

We use the IBM Telco Customer Churn Dataset.

It contains information such as:

Customer demographics Subscription services Contract type Payment methods Monthly & total charges Churn status

Target variable:

Churn Value (1 = Churn, 0 = No Churn) 🧠 Machine Learning Workflow

The project follows a complete ML pipeline:

  1. Data Preprocessing Removed irrelevant columns (CustomerID, location data, etc.) Handled missing values Cleaned dataset

  2. Feature Engineering Created Tenure Groups: New Regular Loyal Very Loyal Encoded categorical variables using One-Hot Encoding

  3. Handling Class Imbalance Used class_weight='balanced' Used scale_pos_weight for XGBoost

  4. Model Training

We trained and compared:

Logistic Regression Random Forest XGBoost

  1. Evaluation Metrics Accuracy Precision Recall F1 Score ROC-AUC

  2. Explainability (SHAP) Identified important features affecting churn Provided model interpretability

  3. Deployment Built interactive web app using Streamlit Real-time churn prediction system

πŸ† Model Performance Model Accuracy Precision Recall F1 Score ROC-AUC Logistic Regression 0.737 0.503 0.773 0.610 0.843 Random Forest 0.793 0.640 0.500 0.562 0.840 XGBoost 0.769 0.553 0.687 0.613 0.833

βœ… Best Model:

Logistic Regression (based on highest Recall & ROC-AUC)

πŸ” Key Insights from EDA Customers with short tenure are more likely to churn Month-to-month contracts have the highest churn rate Higher monthly charges increase churn probability Customers without support/security services churn more πŸ“ˆ SHAP Explainability

SHAP analysis revealed the most important features:

Contract type Tenure months Monthly charges Internet service type

These features strongly influence churn behavior.

πŸ–₯️ Streamlit App Features

The deployed app allows users to:

Enter customer details Get churn probability in real-time View risk level (High / Low) Interactive UI for easy testing πŸ› οΈ Tech Stack Python 🐍 Pandas & NumPy Scikit-learn XGBoost SHAP Streamlit Matplotlib & Seaborn

πŸ“‚ Project Structure customer-churn-project/ β”‚ β”œβ”€β”€ app.py β”œβ”€β”€ churn_model.pkl β”œβ”€β”€ model_columns.pkl β”œβ”€β”€ requirements.txt β”œβ”€β”€ Telco-Customer-Churn.csv β”‚ β”œβ”€β”€ notebooks/ β”‚ └── eda_and_modeling.ipynb β”‚ └── README.md

βš™οΈ Installation & Setup

  1. Clone Repository git clone https://github.com/your-username/churn-prediction.git cd churn-prediction
  2. Install Dependencies pip install -r requirements.txt
  3. Run Streamlit App streamlit run app.py πŸ“¦ Requirements streamlit pandas numpy scikit-learn xgboost joblib matplotlib seaborn shap

πŸ“Œ Business Impact

This system helps telecom companies to:

Identify at-risk customers early Reduce customer churn rate Improve retention strategies Increase revenue stability πŸ“Š Future Improvements Hyperparameter tuning (Optuna / GridSearch) Deep learning model comparison API deployment using FastAPI Dashboard with Power BI / Tableau Automated retraining pipeline πŸ‘¨β€πŸ’» Author

Mohd Faizanullah Machine Learning Enthusiast | AI Developer

⭐ If you like this project

Give this repo a ⭐ and connect for more ML projects!