This documentation explains the machine learning architecture, preprocessing pipeline, optimization strategy, and the reasoning behind simplifying telecom features for both predictive performance and end-user usability.
The prediction system uses the XGBoost (Extreme Gradient Boosting) Classifier because the target task involves predicting a binary customer state (Churn vs. No Churn) using a combination of numerical billing patterns (charges, tenure) and categorical features (contract type, payment method).
XGBoost was selected because it represents the state-of-the-art in gradient boosted decision trees for structured tabular datasets, providing high training efficiency, built-in regularization, and superior classification capabilities.
Traditional linear models assume independent, linear interactions. Customer churn data containing high multi-collinearity and multi-service usage thresholds (where churn peaks at low tenure and moderate charges) is better modeled by decision-tree systems. XGBoost significantly outperformed traditional baseline classifiers during cross-validation.
One-Hot Encoding (OHE) was utilized to transform raw categorical columns (such as Contract, PaperlessBilling, and PaymentMethod) into numeric formats.
GridSearchCV was deployed to run exhaustive hyperparameter tuning over cross-validation folds, identifying optimal parameters for tree depth, estimators, and learning rates.
One important design decision in this project was preprocessing the raw customer usage and service variables into structured, derived feature groups to improve learning quality and simplify client-side entry.
Tabular telecom data often contains detailed, correlated medical and account records. Direct usage of raw data fields can increase dimensionality, cause noise, and complicate user interaction.
The objective was not only maximizing model accuracy and ROC AUC scores, but also creating a customer-centric analytics system that remains understandable, structured, and highly interactive for real-world business decisions.