Weather Forecasting Models — Tufts CS137

Deep learning models for 24-hour weather prediction at Tufts University (Jumbo Statue, Medford MA), trained on NOAA HRRR 3 km reanalysis data.

6 architectures trained and compared: CNN Baseline, ResNet-18, ConvNeXt-Tiny, Multi-frame CNN, 3D CNN, and Vision Transformer (ViT).

Models

Model	File	Params	Architecture	TMP RMSE (K)	Rain AUC
WeatherViT	`vit/best.pt`	7.4M	6-layer Transformer, 15×15 patches, 900 tokens	4.06	0.776
ResNet-18	`checkpoints/resnet18.pt`	11.2M	Modified torchvision ResNet-18	3.54	0.768
CNN Baseline	`checkpoints/cnn_baseline.pt`	11.3M	6 ResBlocks, progressive downsample	4.00	0.738

Full Test Results (2021)

Model	TMP (K)	RH (%)	UGRD (m/s)	VGRD (m/s)	GUST (m/s)	APCP>2mm (mm)	AUC
ViT	4.06	16.45	2.59	2.21	3.57	4.50	0.776
ResNet-18	3.54	15.68	2.70	2.34	3.60	4.53	0.768
CNN Baseline	4.00	15.89	2.56	2.23	3.58	4.56	0.738
ConvNeXt-Tiny	3.66	15.85	2.54	2.17	3.65	4.55	0.692
CNN 3D	4.76	17.44	2.61	2.32	3.58	4.75	0.668
Multi-frame CNN	4.55	18.41	2.62	2.45	3.62	4.76	0.652
Persistence	4.86	23.01	3.73	2.89	4.87	4.62	0.506

Key findings:

ViT achieves the best rain detection AUC (0.776), precipitation RMSE, wind gust, and V-wind
ResNet-18 leads in temperature (3.54 K) and humidity (15.68%) accuracy
All models significantly outperform the persistence baseline

Input

Format: 42-channel spatial grid (450 × 449 pixels)
Resolution: 3 km (HRRR Lambert Conformal projection)
Region: US Northeast / New England (~1350 km × 1350 km)
Channels: Surface variables (temperature, humidity, wind, precipitation, radiation) + atmospheric variables at multiple pressure levels (CAPE, dew point, geopotential height, temperature, U/V wind, cloud cover, moisture)

Output

6 continuous values predicted 24 hours ahead at a single target point:

Variable	Unit
2m Temperature	K
2m Relative Humidity	%
10m U-Wind	m/s
10m V-Wind	m/s
Surface Gust	m/s
1hr Precipitation	mm

Architecture Highlights

WeatherViT (new)

Input (B,42,450,449) → pad→450×450 → PatchEmbed(15×15, 900 patches)
  → [CLS]+PosEmbed → 6×TransformerBlock(8 heads, dim=256) → CLS → FC → (B,6)

CNN Baseline

Input (B,42,450,449) → Stem(42→64, 7×7, s=2) → 6×ResBlock → GAP → FC → (B,6)

ResNet-18

Input (B,42,450,449) → Modified torchvision ResNet-18 (42-ch input) → FC → (B,6)

Checkpoint Format

{
    "model": state_dict,          # Model weights
    "norm_stats": {               # Z-score normalization statistics
        "input_mean": (42, 1, 1),
        "input_std": (42, 1, 1),
        "target_mean": (6,),
        "target_std": (6,),
    },
    "args": {...},                # Training hyperparameters
}

Usage

import torch
from models import create_model

# Load any model (cnn_baseline, resnet18, vit, convnext_tiny, cnn_3d, cnn_multi_frame)
ckpt = torch.load("vit/best.pt", map_location="cpu", weights_only=False)
model = create_model(ckpt["args"]["model"], n_input_channels=42, n_targets=6)
model.load_state_dict(ckpt["model"])
model.eval()

# Inference
x = torch.randn(1, 42, 450, 449)  # (batch, channels, height, width)
norm = ckpt["norm_stats"]
x = (x - norm["input_mean"]) / (norm["input_std"] + 1e-7)
with torch.no_grad():
    pred = model(x)  # (1, 6)
pred = pred * norm["target_std"] + norm["target_mean"]  # denormalize

Training Data

HRRR (High-Resolution Rapid Refresh) — NOAA's 3 km hourly weather analysis.

Split	Period	Samples
Training	2018–2019	~17,500
Validation	2020	~8,700
Test	2021	~8,700

Live Demo

Try the models in real-time with live HRRR data: Tufts Weather Forecast Space

The demo fetches real-time HRRR analysis from NOAA, runs inference, and displays:

Current input field maps (temperature, precipitation, wind, humidity)
24-hour forecast at the Jumbo Statue target point

jeffliulab
/

weather-forecasting-v1