--- title: Customer Churn Prediction emoji: 📉 colorFrom: blue colorTo: purple sdk: streamlit sdk_version: "1.37.0" python_version: "3.10" app_file: app.py pinned: false --- 📉 Customer Churn Prediction An end-to-end Machine Learning project that predicts whether a telecom customer is likely to churn based on usage patterns, subscription details, and customer profile data. The project includes data preprocessing, feature engineering, model training, explainability using SHAP, and deployment using Streamlit. 🚀 Live Demo link: 📌 Problem Statement Telecom companies lose revenue when customers stop using their services (churn). The goal of this project is to: Predict whether a customer will churn or stay, so that businesses can take proactive retention actions. 📊 Dataset We use the IBM Telco Customer Churn Dataset. It contains information such as: Customer demographics Subscription services Contract type Payment methods Monthly & total charges Churn status Target variable: Churn Value (1 = Churn, 0 = No Churn) 🧠 Machine Learning Workflow The project follows a complete ML pipeline: 1. Data Preprocessing Removed irrelevant columns (CustomerID, location data, etc.) Handled missing values Cleaned dataset 2. Feature Engineering Created Tenure Groups: New Regular Loyal Very Loyal Encoded categorical variables using One-Hot Encoding 3. Handling Class Imbalance Used class_weight='balanced' Used scale_pos_weight for XGBoost 4. Model Training We trained and compared: Logistic Regression Random Forest XGBoost 5. Evaluation Metrics Accuracy Precision Recall F1 Score ROC-AUC 6. Explainability (SHAP) Identified important features affecting churn Provided model interpretability 7. Deployment Built interactive web app using Streamlit Real-time churn prediction system 🏆 Model Performance Model Accuracy Precision Recall F1 Score ROC-AUC Logistic Regression 0.737 0.503 0.773 0.610 0.843 Random Forest 0.793 0.640 0.500 0.562 0.840 XGBoost 0.769 0.553 0.687 0.613 0.833 ✅ Best Model: Logistic Regression (based on highest Recall & ROC-AUC) 🔍 Key Insights from EDA Customers with short tenure are more likely to churn Month-to-month contracts have the highest churn rate Higher monthly charges increase churn probability Customers without support/security services churn more 📈 SHAP Explainability SHAP analysis revealed the most important features: Contract type Tenure months Monthly charges Internet service type These features strongly influence churn behavior. 🖥️ Streamlit App Features The deployed app allows users to: Enter customer details Get churn probability in real-time View risk level (High / Low) Interactive UI for easy testing 🛠️ Tech Stack Python 🐍 Pandas & NumPy Scikit-learn XGBoost SHAP Streamlit Matplotlib & Seaborn 📂 Project Structure customer-churn-project/ │ ├── app.py ├── churn_model.pkl ├── model_columns.pkl ├── requirements.txt ├── Telco-Customer-Churn.csv │ ├── notebooks/ │ └── eda_and_modeling.ipynb │ └── README.md ⚙️ Installation & Setup 1. Clone Repository git clone https://github.com/your-username/churn-prediction.git cd churn-prediction 2. Install Dependencies pip install -r requirements.txt 3. Run Streamlit App streamlit run app.py 📦 Requirements streamlit pandas numpy scikit-learn xgboost joblib matplotlib seaborn shap 📌 Business Impact This system helps telecom companies to: Identify at-risk customers early Reduce customer churn rate Improve retention strategies Increase revenue stability 📊 Future Improvements Hyperparameter tuning (Optuna / GridSearch) Deep learning model comparison API deployment using FastAPI Dashboard with Power BI / Tableau Automated retraining pipeline 👨‍💻 Author Mohd Faizanullah Machine Learning Enthusiast | AI Developer ⭐ If you like this project Give this repo a ⭐ and connect for more ML projects!