Demand Regression (LightGBM)

Gradient-boosted regression model predicting demand (continuous, range (0, 1]) from spatio-temporal + contextual tabular features. Trained on 77,299 rows.

Results (5-fold out-of-fold CV)

Metric Value
Rยฒ 0.9608
RMSE 0.02815
MAE 0.01880

Best config (from a 4-point sweep): learning_rate=0.03, num_leaves=63, min_child_samples=30, reg_lambda=0.2, ~4309 trees. Target modeled as log1p(demand); predictions are expm1(...) clipped to โ‰ฅ0.

Features

Raw columns: geohash, day, timestamp, RoadType, NumberofLanes, LargeVehicles, Landmarks, Temperature, Weather.

Engineered:

  • geohash โ†’ lat/lon (decoded) + gh4/gh5 prefixes + raw geohash as categorical
  • time-of-day: hour, minute, minute-of-day, 15-min slot, cyclical sin/cos
  • missing flags (e.g. temp_missing); RoadType/Weather NaN โ†’ "missing" category
  • LightGBM native categorical + native NaN handling for Temperature

Top features: geohash, Temperature, time-of-day (tod/sin/cos), lat/lon.

Usage

import pandas as pd
from huggingface_hub import hf_hub_download
# download predict.py from this repo, then:
from predict import load_model, predict

model = load_model()                 # downloads model.txt
df = pd.read_csv("new_data.csv")     # same raw columns as train (no 'demand')
df["demand_pred"] = predict(model, df)

Files

  • model.txt โ€” LightGBM booster
  • predict.py โ€” feature engineering + inference (handles geohash decode, NaNs)
  • job_train.py / job_run.py โ€” training pipeline (sweep โ†’ 5-fold OOF โ†’ final fit)
  • metrics.json, sweep.json, feature_importance.json

Data

Trained from revanthkolla/ml-intern-1ebf8be8-datasets (config upload_b6c5285be689).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support