Haxxsh/gdgc-datathon-data
Viewer • Updated • 1.05M • 27
How to use Haxxsh/gdgc-datathon-models with Scikit-learn:
from huggingface_hub import hf_hub_download
import joblib
model = joblib.load(
hf_hub_download("Haxxsh/gdgc-datathon-models", "sklearn_model.joblib")
)
# only load pickle files from sources you trust
# read more about it here https://skops.readthedocs.io/en/stable/persistence.htmlTrained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition.
This repository contains ensemble models trained to predict Lap_Time_Seconds for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation.
| File | Description | Size |
|---|---|---|
rf_final.pkl |
Final Random Forest model | 158 MB |
xgb_final.pkl |
Final XGBoost model | 2.6 MB |
rf_cv_models.pkl |
Random Forest CV fold models | 13.4 GB |
xgb_cv_models.pkl |
XGBoost CV fold models | 103 MB |
rf_model.pkl |
Base Random Forest model | 95 MB |
xgb_model.pkl |
Base XGBoost model | 2 MB |
feature_engineer.pkl |
Feature preprocessing pipeline | 6 KB |
best_params.json |
Optimal hyperparameters | 1 KB |
cv_results.json |
Cross-validation results | 1 KB |
The models were trained on the GDGC Datathon 2025 dataset:
Lap_Time_Seconds (continuous)The dataset includes features such as:
import pickle
import joblib
# Load the final models
with open("rf_final.pkl", "rb") as f:
rf_model = pickle.load(f)
with open("xgb_final.pkl", "rb") as f:
xgb_model = pickle.load(f)
# Load feature engineering pipeline
with open("feature_engineer.pkl", "rb") as f:
feature_engineer = pickle.load(f)
import pandas as pd
# Load test data
test_df = pd.read_csv("test.csv")
# Apply feature engineering
X_test = feature_engineer.transform(test_df)
# Predict with ensemble (average of RF and XGB)
rf_preds = rf_model.predict(X_test)
xgb_preds = xgb_model.predict(X_test)
ensemble_preds = (rf_preds + xgb_preds) / 2
from huggingface_hub import hf_hub_download
# Download a specific model file
model_path = hf_hub_download(
repo_id="Haxxsh/gdgc-datathon-models",
filename="xgb_final.pkl"
)
# Load it
with open(model_path, "rb") as f:
model = pickle.load(f)
Best parameters found via cross-validation (see best_params.json):
{
"random_forest": {
"n_estimators": 100,
"max_depth": null,
"min_samples_split": 2,
"min_samples_leaf": 1
},
"xgboost": {
"n_estimators": 100,
"learning_rate": 0.1,
"max_depth": 6
}
}
Cross-validation results are stored in cv_results.json. Primary metric: RMSE (Root Mean Squared Error).
The training code is available on GitHub: ezylopx5/DATATHON
Key files:
train.py - Main training scriptfeatures.py - Feature engineeringpredict.py - Inference scriptMIT License
@misc{gdgc-datathon-2025,
author = {Haxxsh},
title = {GDGC Datathon 2025 Lap Time Prediction Models},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Haxxsh/gdgc-datathon-models}
}