SebastianAndreu
/

2025-24679-NFL-Yards-Predictor

English

Model card Files Files and versions

xet

Community

SebastianAndreu commited on Oct 15, 2025

Commit

cf4c0cd

verified ·

1 Parent(s): ba8fd0a

Create README.md

Browse files

Files changed (1) hide show

README.md +171 -0

README.md ADDED Viewed

	@@ -0,0 +1,171 @@

+---
+license: mit
+language:
+- en
+---
+# 🧠 Model Card: Walk-Forward AutoGluon Model (By Week)
+## 📘 Overview
+This model performs **walk-forward training and evaluation** for predicting **NFL wide receiver (WR) receiving yards** on a week-by-week basis using **AutoGluon’s TabularPredictor**.
+It leverages **historical player embeddings**, **pregame contextual features**, and **weather/game metadata** to iteratively train and test within each NFL season (2016–2025).
+---
+## 🧩 Model Details
+**Model Type:** Walk-forward regression (AutoGluon TabularPredictor)
+**Framework:** [AutoGluon Tabular](https://auto.gluon.ai/stable/tutorials/tabular/index.html)
+**Author:** Sebastian Andreu
+**License:** MIT
+**Primary Use:** Predicting *receiving_yards* for each wide receiver before a game is played.
+### Key Idea
+Instead of training one global model, this script **re-trains weekly within each season**, always using all *prior weeks* as training data and *the next week* as the test set.
+This ensures realistic forward-looking performance without data leakage.
+---
+## ⚙️ Data
+### Source Datasets
+The model loads and concatenates season datasets from:
+```
+SebastianAndreu/24679_NFL_WR_Dataset_<YEAR>
+```
+for `2016 ≤ YEAR ≤ 2025`.
+Each dataset includes **pregame features** such as weather, team matchup, and Vegas lines.
+### Features Used
+Pregame input variables:
+* `defteam`
+* `posteam`
+* `surface`
+* `is_dome`
+* `is_rain`
+* `is_snow`
+* `is_clear`
+* `temp_f`
+* `humidity_pct`
+* `wind_mph`
+* `home_team`
+* `away_team`
+* `pregame_spread`
+* `pregame_total`
+* `passer_player_id`
+* `receiver_player_id`
+The dataset is also merged with **`player_historical_embeddings.csv`**, which provides dense numerical representations of player histories.
+### Target Variable
+`receiving_yards` — the number of receiving yards gained by the WR in the upcoming game.
+---
+## 🧮 Training Procedure
+### Walk-Forward Logic
+For each season:
+1. Extract the total number of weeks in that season.
+2. For each week `W` starting from 2:
+   * **Train** on data from weeks `< W`.
+   * **Test** on data from week `W`.
+   * Train a new AutoGluon model from scratch (10-minute time limit).
+3. Collect predictions and evaluation metrics.
+### AutoGluon Configuration
+```python
+TabularPredictor(
+    label="receiving_yards",
+    path=model_dir,
+    verbosity=0
+).fit(
+    train_data=train[features + [target]],
+    time_limit=600,
+    presets="medium_quality_faster_train"
+)
+```
+**Time Limit:** 600 seconds per week
+**Preset:** `medium_quality_faster_train`
+**Verbosity:** 0 (minimal logging)
+---
+## 📊 Evaluation
+### Metric
+The model computes **Mean Absolute Error (MAE)** over all weekly predictions.
+### Output
+After all walk-forward runs:
+* `walkforward_predictions.csv` — contains true vs. predicted values per week.
+* Columns:
+  * `season`
+  * `week`
+  * `true`
+  * `pred`
+  * `error = |true - pred|`
+Example final output:
+```
+✅ Walk-forward complete!
+Total predictions: 3,200
+Mean Absolute Error: 12.47
+📥 Saved: walkforward_predictions.csv
+```
+---
+## 📢 Artifacts
+| Artifact                                      | Description                        |
+| --------------------------------------------- | ---------------------------------- |
+| `player_historical_embeddings.csv`            | Precomputed player embeddings      |
+| `autogluon_walkforward/`                      | Directory of trained weekly models |
+| `walkforward_predictions.csv`                 | Aggregated results of predictions  |
+| `SebastianAndreu/24679_NFL_WR_Dataset_<YEAR>` | Input datasets (2016–2025)         |
+---
+## 🧠 Intended Use
+**Goal:** Predict individual WR performance before each NFL game.
+**Primary Users:** Sports analytics researchers, fantasy football data scientists, and betting modelers.
+**Not intended for:** Real-time in-game prediction or commercial wagering advice.
+---
+## ⚠️ Limitations
+* Training each week from scratch is computationally expensive.
+* Does not include injury or roster change data.
+* Embeddings rely on prior model quality (`player_historical_embeddings.csv`).
+* Accuracy varies across early vs. late season due to data availability.
+---
+## 🧩 Future Improvements
+* Incorporate **transfer learning** between seasons.
+* Add **injury & snap count features**.
+* Experiment with **AutoGluon ensemble distillation** to reduce retraining cost.
+* Combine with **Model 1 embeddings pipeline** for joint optimization.