Demand Regression (LightGBM)
Gradient-boosted regression model predicting demand (continuous, range (0, 1]) from
spatio-temporal + contextual tabular features. Trained on 77,299 rows.
Results (5-fold out-of-fold CV)
| Metric | Value |
|---|---|
| Rยฒ | 0.9608 |
| RMSE | 0.02815 |
| MAE | 0.01880 |
Best config (from a 4-point sweep): learning_rate=0.03, num_leaves=63, min_child_samples=30, reg_lambda=0.2, ~4309 trees. Target modeled as log1p(demand); predictions are expm1(...) clipped to โฅ0.
Features
Raw columns: geohash, day, timestamp, RoadType, NumberofLanes, LargeVehicles, Landmarks, Temperature, Weather.
Engineered:
- geohash โ lat/lon (decoded) +
gh4/gh5prefixes + raw geohash as categorical - time-of-day: hour, minute, minute-of-day, 15-min slot, cyclical sin/cos
- missing flags (e.g.
temp_missing);RoadType/WeatherNaN โ"missing"category - LightGBM native categorical + native NaN handling for
Temperature
Top features: geohash, Temperature, time-of-day (tod/sin/cos), lat/lon.
Usage
import pandas as pd
from huggingface_hub import hf_hub_download
# download predict.py from this repo, then:
from predict import load_model, predict
model = load_model() # downloads model.txt
df = pd.read_csv("new_data.csv") # same raw columns as train (no 'demand')
df["demand_pred"] = predict(model, df)
Files
model.txtโ LightGBM boosterpredict.pyโ feature engineering + inference (handles geohash decode, NaNs)job_train.py/job_run.pyโ training pipeline (sweep โ 5-fold OOF โ final fit)metrics.json,sweep.json,feature_importance.json
Data
Trained from revanthkolla/ml-intern-1ebf8be8-datasets (config upload_b6c5285be689).