irene / PROJECT_README.md
franch's picture
Borderless logo tables
6a37e54 verified

ConvGRU-Ensemble

Ensemble precipitation nowcasting using Convolutional GRU networks

Pretrained model for Italy: IRENEItalian Radar Ensemble Nowcasting Experiment

CI License: BSD-2 Python 3.13+ HuggingFace


IT4LIA AI-Factory
Fondazione Bruno Kessler ItaliaMeteo

The model encodes past radar frames into multi-scale hidden states and decodes them into an ensemble of probabilistic forecasts by running the decoder multiple times with different noise inputs, trained with CRPS loss.


Quick Start

Load from HuggingFace Hub
from convgru_ensemble import RadarLightningModel

model = RadarLightningModel.from_pretrained("it4lia/irene")

import numpy as np
past = np.load("past_radar.npy")  # rain rate in mm/h, shape (T_past, H, W)
forecasts = model.predict(past, forecast_steps=12, ensemble_size=10)
# forecasts.shape = (10, 12, H, W) — 10 members, 12 future steps, mm/h
CLI Inference
convgru-ensemble predict \
    --input examples/sample_data.nc \
    --hub-repo it4lia/irene \
    --forecast-steps 12 \
    --ensemble-size 10 \
    --output predictions.nc
Serve via API
# With Docker
docker compose up

# Or directly
pip install convgru-ensemble[serve]
convgru-ensemble serve --hub-repo it4lia/irene --port 8000

Submit a forecast request:

# 4-hour forecast (4 steps × 1h) with 5 ensemble members
curl -X POST "http://localhost:8000/predict?forecast_steps=4&ensemble_size=5" \
    -F "file=@examples/sample_data.nc" \
    -o predictions.nc

# Use default settings (12 steps, 10 members)
curl -X POST http://localhost:8000/predict \
    -F "file=@examples/sample_data.nc" -o predictions.nc

Read the predictions:

import xarray as xr

ds = xr.open_dataset("predictions.nc")
print(ds.precipitation_forecast.shape)
# (5, 4, 1400, 1200) — ensemble_member, forecast_step, y, x
Endpoint Method Description
/health GET Health check
/model/info GET Model metadata and hyperparameters
/predict POST Upload NetCDF, get ensemble forecast as NetCDF

/predict query parameters:

Parameter Default Description
variable RR Name of the rain rate variable in the NetCDF
forecast_steps 12 Number of future 5-min steps (1–48, i.e. max 4h)
ensemble_size 10 Number of ensemble members (1–10)

The input NetCDF must contain a 3D variable (T, H, W) with rain rate in mm/h and at least 2 timesteps.

Fine-tune on your data
pip install convgru-ensemble
# See "Training" section below

Setup

Requires Python >= 3.13. Uses uv for dependency management.

uv sync                    # core dependencies
uv sync --extra serve      # + FastAPI serving

Data Preparation

The training pipeline expects a Zarr dataset with a rain rate variable RR indexed by (time, x, y).

1. Filter valid datacubes

Scan the Zarr and find all space-time datacubes with fewer than n_nan NaN values:

cd importance_sampler
uv run python filter_nan.py path/to/dataset.zarr \
    --start_date 2021-01-01 --end_date 2025-12-11 \
    --Dt 24 --w 256 --h 256 \
    --step_T 3 --step_X 16 --step_Y 16 \
    --n_nan 10000 --n_workers 8
2. Importance sampling

Sample valid datacubes with higher probability for rainier events:

uv run python sample_valid_datacubes.py path/to/dataset.zarr valid_datacubes_*.csv \
    --q_min 1e-4 --m 0.1 --n_workers 8

A pre-sampled CSV is provided in importance_sampler/output/.

Training

Training is configured via Fiddle. Run with defaults:

uv run python -m convgru_ensemble.train

Override parameters from the command line:

uv run python -m convgru_ensemble.train \
    --config config:experiment \
    --config set:model.num_blocks=5 \
    --config set:model.forecast_steps=12 \
    --config set:model.loss_class=crps \
    --config set:model.ensemble_size=2 \
    --config set:datamodule.batch_size=16 \
    --config set:trainer.max_epochs=100

Monitor with TensorBoard: uv run tensorboard --logdir logs/

Parameter Description Default
model.num_blocks Encoder/decoder depth 5
model.forecast_steps Future steps to predict 12
model.ensemble_size Ensemble members during training 2
model.loss_class Loss function (mse, mae, crps, afcrps) crps
model.masked_loss Mask NaN regions in loss True
datamodule.steps Total timesteps per sample (past + future) 18
datamodule.batch_size Batch size 16

Architecture

Input (B, T_past, 1, H, W)
    |
    v
+--------------------------+
|        Encoder           |  ConvGRU + PixelUnshuffle (x num_blocks)
|  Spatial dims halve at   |  Channels: 1 -> 4 -> 16 -> 64 -> 256 -> 1024
|  each block              |
+----------+---------------+
           | hidden states
           v
+--------------------------+
|        Decoder           |  ConvGRU + PixelShuffle (x num_blocks)
|  Noise input (x M runs)  |  Each run produces one ensemble member
|  for ensemble generation |
+----------+---------------+
           |
           v
Output (B, T_future, M, H, W)

Docker

docker build -t convgru-ensemble .

# Run with local checkpoint
docker run -p 8000:8000 -v ./checkpoints:/app/checkpoints \
    -e MODEL_CHECKPOINT=/app/checkpoints/model.ckpt convgru-ensemble

# Run with HuggingFace Hub
docker run -p 8000:8000 -e HF_REPO_ID=it4lia/irene convgru-ensemble

Project Structure

ConvGRU-Ensemble/
+-- convgru_ensemble/          # Python package
|   +-- model.py               # ConvGRU encoder-decoder architecture
|   +-- losses.py              # CRPS, afCRPS, masked loss wrappers
|   +-- lightning_model.py     # PyTorch Lightning training module
|   +-- datamodule.py          # Dataset and data loading
|   +-- train.py               # Training entry point (Fiddle config)
|   +-- utils.py               # Rain rate <-> reflectivity conversions
|   +-- hub.py                 # HuggingFace Hub upload/download
|   +-- cli.py                 # CLI for inference and serving
|   +-- serve.py               # FastAPI inference server
+-- examples/                  # Sample data for testing
+-- importance_sampler/        # Data preparation scripts
+-- notebooks/                 # Example notebooks
+-- scripts/                   # Utility scripts (e.g., upload to Hub)
+-- tests/                     # Test suite
+-- Dockerfile                 # Container for serving API
+-- MODEL_CARD.md              # HuggingFace model card template

Acknowledgements

Developed at Fondazione Bruno Kessler (FBK), Trento, Italy, as part of the Italian AI-Factory (IT4LIA), an EU-funded initiative supporting AI adoption across SMEs, academia, and public/private sectors. This work showcases capabilities in the Earth (weather and climate) vertical domain.


IT4LIA AI-Factory
Fondazione Bruno Kessler ItaliaMeteo

License

BSD 2-Clause — see LICENSE.