Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- xgboost
|
| 5 |
+
- lightgbm
|
| 6 |
+
- sports-prediction
|
| 7 |
+
- formula1
|
| 8 |
+
- tabular
|
| 9 |
+
- classification
|
| 10 |
+
- ensemble
|
| 11 |
+
- optuna
|
| 12 |
+
language:
|
| 13 |
+
- en
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Telemetry Chaos — F1 Race Prediction Model
|
| 17 |
+
|
| 18 |
+
XGBoost + LightGBM ensemble predicting Formula 1 race winners from 76 seasons of historical data. Tuned with Optuna hyperparameter optimization across 200 trials. Auto-retrains weekly during the active season.
|
| 19 |
+
|
| 20 |
+
**Live demo:** [telemetrychaos.space](https://telemetrychaos.space)
|
| 21 |
+
|
| 22 |
+
## Performance
|
| 23 |
+
|
| 24 |
+
| Metric | Score |
|
| 25 |
+
|---|---|
|
| 26 |
+
| Top-1 Accuracy | 53% |
|
| 27 |
+
| Top-3 Accuracy | 85% |
|
| 28 |
+
| Top-5 Accuracy | 96% |
|
| 29 |
+
|
| 30 |
+
Evaluated on 2024–2025 seasons with time-series split to prevent data leakage.
|
| 31 |
+
|
| 32 |
+
## Features (21 per driver per race)
|
| 33 |
+
|
| 34 |
+
- **Form:** Rolling average points, recent podiums, win streak
|
| 35 |
+
- **Pace:** Practice session lap time delta vs. teammate and field
|
| 36 |
+
- **Constructor:** Team rolling performance, reliability score
|
| 37 |
+
- **Track history:** Driver-specific circuit win rate, podium rate
|
| 38 |
+
- **Tyre:** Degradation profile, pit stop speed, strategy tendency
|
| 39 |
+
- **Conditions:** Weather forecast, safety car probability
|
| 40 |
+
- **Grid:** Starting position, qualifying gap to pole
|
| 41 |
+
|
| 42 |
+
## Architecture
|
| 43 |
+
|
| 44 |
+
```
|
| 45 |
+
XGBoost (GPU) + LightGBM
|
| 46 |
+
Optuna HPO: 200 trials, TPE sampler
|
| 47 |
+
Time-series split: train on seasons N-5 to N-1, evaluate on N
|
| 48 |
+
Final output: softmax win probabilities per driver
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
## Dataset
|
| 52 |
+
|
| 53 |
+
- **Coverage:** 1950–2025, 76 seasons
|
| 54 |
+
- **Records:** 1,322,914 race records
|
| 55 |
+
- **Telemetry laps:** 470K+
|
| 56 |
+
- **Sources:** FastF1, Jolpica-F1, f1db, Kaggle
|
| 57 |
+
|
| 58 |
+
## Usage
|
| 59 |
+
|
| 60 |
+
```python
|
| 61 |
+
import joblib
|
| 62 |
+
|
| 63 |
+
model = joblib.load("f1_ensemble.joblib")
|
| 64 |
+
# Input: 21-feature vector per driver
|
| 65 |
+
# Output: win probability (0-1)
|
| 66 |
+
probs = model.predict_proba(X)
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## Auto-Update Pipeline
|
| 70 |
+
|
| 71 |
+
During the active F1 season the model retrains weekly:
|
| 72 |
+
1. Pull latest race results and telemetry via FastF1
|
| 73 |
+
2. Engineer features for upcoming race grid
|
| 74 |
+
3. Retrain ensemble with updated data
|
| 75 |
+
4. Publish updated predictions to telemetrychaos.space
|
| 76 |
+
|
| 77 |
+
## Citation
|
| 78 |
+
|
| 79 |
+
```bibtex
|
| 80 |
+
@misc{rubin2026telemetrychaos,
|
| 81 |
+
author = {Rubin, Theodore},
|
| 82 |
+
title = {Telemetry Chaos: F1 Race Prediction with XGBoost/LightGBM Ensemble},
|
| 83 |
+
year = {2026},
|
| 84 |
+
publisher = {HuggingFace},
|
| 85 |
+
url = {https://huggingface.co/datamatters24/f1-race-predictor-model}
|
| 86 |
+
}
|
| 87 |
+
```
|