π Basketball Player Performance: Prediction & Classification Pipeline
Course: Introduction to Data Science | Reichman University
π₯ Project Presentation
βΆοΈ Click here to watch the full project presentation on Loom
π Project Overview
This project leverages Machine Learning to analyze basketball player performance. We developed a comprehensive pipeline that includes Exploratory Data Analysis (EDA), Unsupervised Clustering, and Advanced Supervised Learning for both Regression and Classification tasks.
π Links & Resources
- Google Colab Notebook: Click here to view the full code
- Model Repository: Hugging Face Project Page
π Exploratory Data Analysis (EDA)
Our analysis revealed key insights into how player attributes affect their scoring output.
π οΈ Feature Engineering & Clustering
We used K-Means Clustering to identify player archetypes based on physical metrics. To visualize these high-dimensional clusters, we applied PCA (Principal Component Analysis).
Cluster Visualization (PCA)
The plot shows how players are grouped into distinct physical profiles, which were later used as features in our predictive models.
π€ Modeling & Evaluation
1. Regression (Predicting Points Per Game)
We evaluated multiple models to predict continuous scoring output. Our Gradient Boosting model significantly outperformed the baseline.
Feature Importance Comparison:
2. Classification (Performance Tiers)
We reframed the problem to classify players into three tiers (Low, Medium, High). Given the high cost of "Draft Busts" in scouting, we optimized for Precision.
Winning Model Confusion Matrix:
The matrix demonstrates the model's high reliability in identifying 'Star' players (Class 2) while minimizing False Positives.
π‘ Key Conclusions
- Feature Engineering is Crucial: Unsupervised learning (K-Means) successfully uncovered hidden physical archetypes, which proved to be strong predictors in our supervised models.
- Non-Linearity Matters: Tree-based models (Gradient Boosting and Random Forest) significantly outperformed the baseline Linear Regression, highlighting the complex, non-linear relationships between a player's physical attributes, playstyle, and scoring output.
- Business-Driven Metrics: In sports analytics, framing the problem around real-world business needs (e.g., prioritizing Precision to avoid costly draft busts) is just as important as the model's overall accuracy.
π Repository Contents
winning_model_gb.pkl: Final regression model.winning_classifier_model.pkl: Final classification model.*.png: All project visualizations.README.md: Project documentation and presentation.
Developed by Nir Missri



