Flash Flood Probability Predictor (XGBoost)
Model Description
This repository contains a lightweight XGBoost Regressor trained to predict the probability of flash floods (from 0 to 100%) based on simulated Malaysian hydrological data. The model analyzes historical and current hourly rainfall (mm) and river water levels (m) to calculate short-term flooding risks, capturing the delayed response of river runoff to heavy monsoonal storms.
This model (flash_flood_predictor.joblib) was trained on a 10-year simulated dataset (2015-2025) which provides the best balance of variance and generalization.
Intended Use
- Flash Flood Early Warning: Calculating real-time risk scores using sensor streams of river levels and rainfall.
- Hydrological Simulations: Estimating the impact of varying rainfall intensities on flood probabilities.
Model Performance
The model's performance metrics on the test dataset are:
- Test RMSE: 2.7202
- Test MAE: 2.1971
- Test R²: 0.9927
Feature Engineering
The model requires raw rainfall and river level data to be pre-processed into temporal features, including:
- Lags:
river_level_lag_1h,river_level_lag_2h,rainfall_1h_lag_1h - Rolling Rates:
river_change_1h,river_change_3h,river_change_6h - Interactions: Compounding risk features like
rain_river_interaction_24h(Rainfall 24h × River Level) - Seasonal:
hour,month
The most critical features driving the model's predictions are the Current River Water Level (45% importance) and the River Water Level 1 Hour Ago (40% importance).
How to Use
You can load and use the model in Python via joblib:
import joblib
import pandas as pd
import numpy as np
# 1. Load the model artifact
model_data = joblib.load("flash_flood_predictor.joblib")
model = model_data['model']
expected_features = model_data['features']
# 2. Prepare your feature dictionary (must be pre-engineered)
sample_features = {
'rainfall_1h_mm': 15.0,
'rainfall_3h_mm': 45.0,
'rainfall_6h_mm': 60.0,
'rainfall_24h_mm': 90.0,
'cumulative_rainfall_3day_mm': 120.0,
'river_water_level_m': 4.2,
'river_change_1h': 0.3,
'river_change_3h': 0.8,
'river_change_6h': 1.2,
'river_level_lag_1h': 3.9,
'river_level_lag_2h': 3.6,
'rainfall_1h_lag_1h': 20.0,
'rainfall_1h_lag_2h': 10.0,
'rain_river_interaction_3h': 45.0 * 4.2,
'rain_river_interaction_24h': 90.0 * 4.2,
'hour': 21,
'month': 5
}
# 3. Convert to DataFrame and Predict
X_single = pd.DataFrame([sample_features])[expected_features]
prediction = model.predict(X_single)[0]
# Clamp to 0-100% probability
flood_probability = np.clip(prediction, 0.0, 100.0)
print(f"Predicted Flash Flood Probability: {flood_probability:.1f}%")
Limitations
- Synthetic Data: This model was trained on synthetic data meant to closely mimic Malaysian hydrology. For production deployment, it must be retrained on real local telemetry.
- Geographic Specificity: The generated weather patterns assume tropical monsoonal climates.