SebastianAndreu commited on
Commit
cf4c0cd
·
verified ·
1 Parent(s): ba8fd0a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +171 -0
README.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # 🧠 Model Card: Walk-Forward AutoGluon Model (By Week)
8
+
9
+ ## 📘 Overview
10
+
11
+ This model performs **walk-forward training and evaluation** for predicting **NFL wide receiver (WR) receiving yards** on a week-by-week basis using **AutoGluon’s TabularPredictor**.
12
+ It leverages **historical player embeddings**, **pregame contextual features**, and **weather/game metadata** to iteratively train and test within each NFL season (2016–2025).
13
+
14
+ ---
15
+
16
+ ## 🧩 Model Details
17
+
18
+ **Model Type:** Walk-forward regression (AutoGluon TabularPredictor)
19
+ **Framework:** [AutoGluon Tabular](https://auto.gluon.ai/stable/tutorials/tabular/index.html)
20
+ **Author:** Sebastian Andreu
21
+ **License:** MIT
22
+ **Primary Use:** Predicting *receiving_yards* for each wide receiver before a game is played.
23
+
24
+ ### Key Idea
25
+
26
+ Instead of training one global model, this script **re-trains weekly within each season**, always using all *prior weeks* as training data and *the next week* as the test set.
27
+ This ensures realistic forward-looking performance without data leakage.
28
+
29
+ ---
30
+
31
+ ## ⚙️ Data
32
+
33
+ ### Source Datasets
34
+
35
+ The model loads and concatenates season datasets from:
36
+
37
+ ```
38
+ SebastianAndreu/24679_NFL_WR_Dataset_<YEAR>
39
+ ```
40
+
41
+ for `2016 ≤ YEAR ≤ 2025`.
42
+
43
+ Each dataset includes **pregame features** such as weather, team matchup, and Vegas lines.
44
+
45
+ ### Features Used
46
+
47
+ Pregame input variables:
48
+
49
+ * `defteam`
50
+ * `posteam`
51
+ * `surface`
52
+ * `is_dome`
53
+ * `is_rain`
54
+ * `is_snow`
55
+ * `is_clear`
56
+ * `temp_f`
57
+ * `humidity_pct`
58
+ * `wind_mph`
59
+ * `home_team`
60
+ * `away_team`
61
+ * `pregame_spread`
62
+ * `pregame_total`
63
+ * `passer_player_id`
64
+ * `receiver_player_id`
65
+
66
+ The dataset is also merged with **`player_historical_embeddings.csv`**, which provides dense numerical representations of player histories.
67
+
68
+ ### Target Variable
69
+
70
+ `receiving_yards` — the number of receiving yards gained by the WR in the upcoming game.
71
+
72
+ ---
73
+
74
+ ## 🧮 Training Procedure
75
+
76
+ ### Walk-Forward Logic
77
+
78
+ For each season:
79
+
80
+ 1. Extract the total number of weeks in that season.
81
+ 2. For each week `W` starting from 2:
82
+
83
+ * **Train** on data from weeks `< W`.
84
+ * **Test** on data from week `W`.
85
+ * Train a new AutoGluon model from scratch (10-minute time limit).
86
+ 3. Collect predictions and evaluation metrics.
87
+
88
+ ### AutoGluon Configuration
89
+
90
+ ```python
91
+ TabularPredictor(
92
+ label="receiving_yards",
93
+ path=model_dir,
94
+ verbosity=0
95
+ ).fit(
96
+ train_data=train[features + [target]],
97
+ time_limit=600,
98
+ presets="medium_quality_faster_train"
99
+ )
100
+ ```
101
+
102
+ **Time Limit:** 600 seconds per week
103
+ **Preset:** `medium_quality_faster_train`
104
+ **Verbosity:** 0 (minimal logging)
105
+
106
+ ---
107
+
108
+ ## 📊 Evaluation
109
+
110
+ ### Metric
111
+
112
+ The model computes **Mean Absolute Error (MAE)** over all weekly predictions.
113
+
114
+ ### Output
115
+
116
+ After all walk-forward runs:
117
+
118
+ * `walkforward_predictions.csv` — contains true vs. predicted values per week.
119
+ * Columns:
120
+
121
+ * `season`
122
+ * `week`
123
+ * `true`
124
+ * `pred`
125
+ * `error = |true - pred|`
126
+
127
+ Example final output:
128
+
129
+ ```
130
+ ✅ Walk-forward complete!
131
+ Total predictions: 3,200
132
+ Mean Absolute Error: 12.47
133
+ 📥 Saved: walkforward_predictions.csv
134
+ ```
135
+
136
+ ---
137
+
138
+ ## 📢 Artifacts
139
+
140
+ | Artifact | Description |
141
+ | --------------------------------------------- | ---------------------------------- |
142
+ | `player_historical_embeddings.csv` | Precomputed player embeddings |
143
+ | `autogluon_walkforward/` | Directory of trained weekly models |
144
+ | `walkforward_predictions.csv` | Aggregated results of predictions |
145
+ | `SebastianAndreu/24679_NFL_WR_Dataset_<YEAR>` | Input datasets (2016–2025) |
146
+
147
+ ---
148
+
149
+ ## 🧠 Intended Use
150
+
151
+ **Goal:** Predict individual WR performance before each NFL game.
152
+ **Primary Users:** Sports analytics researchers, fantasy football data scientists, and betting modelers.
153
+ **Not intended for:** Real-time in-game prediction or commercial wagering advice.
154
+
155
+ ---
156
+
157
+ ## ⚠️ Limitations
158
+
159
+ * Training each week from scratch is computationally expensive.
160
+ * Does not include injury or roster change data.
161
+ * Embeddings rely on prior model quality (`player_historical_embeddings.csv`).
162
+ * Accuracy varies across early vs. late season due to data availability.
163
+
164
+ ---
165
+
166
+ ## 🧩 Future Improvements
167
+
168
+ * Incorporate **transfer learning** between seasons.
169
+ * Add **injury & snap count features**.
170
+ * Experiment with **AutoGluon ensemble distillation** to reduce retraining cost.
171
+ * Combine with **Model 1 embeddings pipeline** for joint optimization.