Weather Forecasting Models — Tufts CS137

Deep learning models for 24-hour weather prediction at Tufts University (Jumbo Statue, Medford MA), trained on NOAA HRRR 3 km reanalysis data.

6 architectures trained and compared: CNN Baseline, ResNet-18, ConvNeXt-Tiny, Multi-frame CNN, 3D CNN, and Vision Transformer (ViT).

Models

Model File Params Architecture TMP RMSE (K) Rain AUC
WeatherViT vit/best.pt 7.4M 6-layer Transformer, 15×15 patches, 900 tokens 4.06 0.776
ResNet-18 checkpoints/resnet18.pt 11.2M Modified torchvision ResNet-18 3.54 0.768
CNN Baseline checkpoints/cnn_baseline.pt 11.3M 6 ResBlocks, progressive downsample 4.00 0.738

Full Test Results (2021)

Model TMP (K) RH (%) UGRD (m/s) VGRD (m/s) GUST (m/s) APCP>2mm (mm) AUC
ViT 4.06 16.45 2.59 2.21 3.57 4.50 0.776
ResNet-18 3.54 15.68 2.70 2.34 3.60 4.53 0.768
CNN Baseline 4.00 15.89 2.56 2.23 3.58 4.56 0.738
ConvNeXt-Tiny 3.66 15.85 2.54 2.17 3.65 4.55 0.692
CNN 3D 4.76 17.44 2.61 2.32 3.58 4.75 0.668
Multi-frame CNN 4.55 18.41 2.62 2.45 3.62 4.76 0.652
Persistence 4.86 23.01 3.73 2.89 4.87 4.62 0.506

Key findings:

  • ViT achieves the best rain detection AUC (0.776), precipitation RMSE, wind gust, and V-wind
  • ResNet-18 leads in temperature (3.54 K) and humidity (15.68%) accuracy
  • All models significantly outperform the persistence baseline

Input

  • Format: 42-channel spatial grid (450 × 449 pixels)
  • Resolution: 3 km (HRRR Lambert Conformal projection)
  • Region: US Northeast / New England (~1350 km × 1350 km)
  • Channels: Surface variables (temperature, humidity, wind, precipitation, radiation) + atmospheric variables at multiple pressure levels (CAPE, dew point, geopotential height, temperature, U/V wind, cloud cover, moisture)

Output

6 continuous values predicted 24 hours ahead at a single target point:

Variable Unit
2m Temperature K
2m Relative Humidity %
10m U-Wind m/s
10m V-Wind m/s
Surface Gust m/s
1hr Precipitation mm

Architecture Highlights

WeatherViT (new)

Input (B,42,450,449) → pad→450×450 → PatchEmbed(15×15, 900 patches)
  → [CLS]+PosEmbed → 6×TransformerBlock(8 heads, dim=256) → CLS → FC → (B,6)

CNN Baseline

Input (B,42,450,449) → Stem(42→64, 7×7, s=2) → 6×ResBlock → GAP → FC → (B,6)

ResNet-18

Input (B,42,450,449) → Modified torchvision ResNet-18 (42-ch input) → FC → (B,6)

Checkpoint Format

{
    "model": state_dict,          # Model weights
    "norm_stats": {               # Z-score normalization statistics
        "input_mean": (42, 1, 1),
        "input_std": (42, 1, 1),
        "target_mean": (6,),
        "target_std": (6,),
    },
    "args": {...},                # Training hyperparameters
}

Usage

import torch
from models import create_model

# Load any model (cnn_baseline, resnet18, vit, convnext_tiny, cnn_3d, cnn_multi_frame)
ckpt = torch.load("vit/best.pt", map_location="cpu", weights_only=False)
model = create_model(ckpt["args"]["model"], n_input_channels=42, n_targets=6)
model.load_state_dict(ckpt["model"])
model.eval()

# Inference
x = torch.randn(1, 42, 450, 449)  # (batch, channels, height, width)
norm = ckpt["norm_stats"]
x = (x - norm["input_mean"]) / (norm["input_std"] + 1e-7)
with torch.no_grad():
    pred = model(x)  # (1, 6)
pred = pred * norm["target_std"] + norm["target_mean"]  # denormalize

Training Data

HRRR (High-Resolution Rapid Refresh) — NOAA's 3 km hourly weather analysis.

Split Period Samples
Training 2018–2019 ~17,500
Validation 2020 ~8,700
Test 2021 ~8,700

Live Demo

Try the models in real-time with live HRRR data: Tufts Weather Forecast Space

The demo fetches real-time HRRR analysis from NOAA, runs inference, and displays:

  • Current input field maps (temperature, precipitation, wind, humidity)
  • 24-hour forecast at the Jumbo Statue target point

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using jeffliulab/weather-forecasting-v1 1