πŸ€ Basketball Player Performance: Prediction & Classification Pipeline

Course: Introduction to Data Science | Reichman University

πŸŽ₯ Project Presentation

▢️ Click here to watch the full project presentation on Loom

πŸ“Œ Project Overview

This project leverages Machine Learning to analyze basketball player performance. We developed a comprehensive pipeline that includes Exploratory Data Analysis (EDA), Unsupervised Clustering, and Advanced Supervised Learning for both Regression and Classification tasks.

πŸš€ Links & Resources

πŸ“Š Exploratory Data Analysis (EDA)

Our analysis revealed key insights into how player attributes affect their scoring output.

Scoring Trends by Age Playmaking vs. Scoring
Age Trends Playmaking

πŸ› οΈ Feature Engineering & Clustering

We used K-Means Clustering to identify player archetypes based on physical metrics. To visualize these high-dimensional clusters, we applied PCA (Principal Component Analysis).

Cluster Visualization (PCA)

Clusters PCA The plot shows how players are grouped into distinct physical profiles, which were later used as features in our predictive models.

πŸ€– Modeling & Evaluation

1. Regression (Predicting Points Per Game)

We evaluated multiple models to predict continuous scoring output. Our Gradient Boosting model significantly outperformed the baseline.

Feature Importance Comparison:

Random Forest Gradient Boosting
RF Importance GB Importance

2. Classification (Performance Tiers)

We reframed the problem to classify players into three tiers (Low, Medium, High). Given the high cost of "Draft Busts" in scouting, we optimized for Precision.

Winning Model Confusion Matrix: Confusion Matrix The matrix demonstrates the model's high reliability in identifying 'Star' players (Class 2) while minimizing False Positives.

πŸ’‘ Key Conclusions

  • Feature Engineering is Crucial: Unsupervised learning (K-Means) successfully uncovered hidden physical archetypes, which proved to be strong predictors in our supervised models.
  • Non-Linearity Matters: Tree-based models (Gradient Boosting and Random Forest) significantly outperformed the baseline Linear Regression, highlighting the complex, non-linear relationships between a player's physical attributes, playstyle, and scoring output.
  • Business-Driven Metrics: In sports analytics, framing the problem around real-world business needs (e.g., prioritizing Precision to avoid costly draft busts) is just as important as the model's overall accuracy.

πŸ“‚ Repository Contents

  • winning_model_gb.pkl: Final regression model.
  • winning_classifier_model.pkl: Final classification model.
  • *.png: All project visualizations.
  • README.md: Project documentation and presentation.

Developed by Nir Missri


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support