Flash Flood Probability Predictor (XGBoost)

Model Description

This repository contains a lightweight XGBoost Regressor trained to predict the probability of flash floods (from 0 to 100%) based on simulated Malaysian hydrological data. The model analyzes historical and current hourly rainfall (mm) and river water levels (m) to calculate short-term flooding risks, capturing the delayed response of river runoff to heavy monsoonal storms.

This model (flash_flood_predictor.joblib) was trained on a 10-year simulated dataset (2015-2025) which provides the best balance of variance and generalization.

Intended Use

  • Flash Flood Early Warning: Calculating real-time risk scores using sensor streams of river levels and rainfall.
  • Hydrological Simulations: Estimating the impact of varying rainfall intensities on flood probabilities.

Model Performance

The model's performance metrics on the test dataset are:

  • Test RMSE: 2.7202
  • Test MAE: 2.1971
  • Test R²: 0.9927

Feature Engineering

The model requires raw rainfall and river level data to be pre-processed into temporal features, including:

  1. Lags: river_level_lag_1h, river_level_lag_2h, rainfall_1h_lag_1h
  2. Rolling Rates: river_change_1h, river_change_3h, river_change_6h
  3. Interactions: Compounding risk features like rain_river_interaction_24h (Rainfall 24h × River Level)
  4. Seasonal: hour, month

The most critical features driving the model's predictions are the Current River Water Level (45% importance) and the River Water Level 1 Hour Ago (40% importance).

How to Use

You can load and use the model in Python via joblib:

import joblib
import pandas as pd
import numpy as np

# 1. Load the model artifact
model_data = joblib.load("flash_flood_predictor.joblib")
model = model_data['model']
expected_features = model_data['features']

# 2. Prepare your feature dictionary (must be pre-engineered)
sample_features = {
    'rainfall_1h_mm': 15.0,
    'rainfall_3h_mm': 45.0,
    'rainfall_6h_mm': 60.0,
    'rainfall_24h_mm': 90.0,
    'cumulative_rainfall_3day_mm': 120.0,
    'river_water_level_m': 4.2,
    'river_change_1h': 0.3,
    'river_change_3h': 0.8,
    'river_change_6h': 1.2,
    'river_level_lag_1h': 3.9,
    'river_level_lag_2h': 3.6,
    'rainfall_1h_lag_1h': 20.0,
    'rainfall_1h_lag_2h': 10.0,
    'rain_river_interaction_3h': 45.0 * 4.2,
    'rain_river_interaction_24h': 90.0 * 4.2,
    'hour': 21,
    'month': 5
}

# 3. Convert to DataFrame and Predict
X_single = pd.DataFrame([sample_features])[expected_features]
prediction = model.predict(X_single)[0]

# Clamp to 0-100% probability
flood_probability = np.clip(prediction, 0.0, 100.0)
print(f"Predicted Flash Flood Probability: {flood_probability:.1f}%")

Limitations

  • Synthetic Data: This model was trained on synthetic data meant to closely mimic Malaysian hydrology. For production deployment, it must be retrained on real local telemetry.
  • Geographic Specificity: The generated weather patterns assume tropical monsoonal climates.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support