TrueSea

WW3 point-forecast corrector (model output statistics).

LightGBM model that corrects NOAA WAVEWATCH III significant wave height at point outputs. Input: the model's own state at a station-hour (total Hs, 10 m wind, top-3 swell partitions as height/period/direction with directions encoded sin/cos, station lat/lon, cyclic month and hour). Partition heights are normalized so their square-root sum of squares equals total Hs, making the features consistent between hindcast PART files and operational bulletins. Output: corrected significant wave height.

Trained on 3.9M station-hours (181 NDBC stations, 2015-08 to 2019-05) pairing the multi_1 production hindcast point output with NDBC observed wave height. Early stopping uses a validation slice from the training period.

Evaluation

Protocol	Test set	raw WW3 RMSE	MOS RMSE	gain	raw bias	MOS bias
Time split (train 2015-16, val 2017)	2018-19, all stations	0.430 m	0.399 m	+7%	-0.104 m	+0.002 m
Station split	25 held-out stations, all years	0.381 m	0.332 m	+13%	-0.113 m	-0.002 m
Strict split	held-out stations x 2018-19	0.442 m	0.424 m	+4%	-0.115 m	-0.013 m
Live GFS-wave (June 2026, station 44025, lead < 72 h)	2,393 forecast hours	0.242 m	0.241 m	0%	-0.075 m	-0.074 m

Gains by observed quantile peak in the 50-99th percentile range (+8% to +17% on the station split). The live row applies the hindcast-trained model to operational GFS-wave bulletins with spectra-file winds; GFS-wave at this station and season runs near-unbiased, leaving little correction headroom in that window.

Files

mos_hs.safetensors — the ensemble as flat node arrays (gbdt-flat-v1): feature (int32, -1 at leaves), threshold (f64, x <= t goes left), left/right (int32, tree-local child indices), value (f64 leaf values), tree_offset (int64 root of each tree). Feature names are in the file metadata. No LightGBM dependency needed:

import numpy as np
from safetensors.numpy import load_file

t = load_file("mos_hs.safetensors")

def predict(x):  # x: array of the 22 features in metadata order
    s = 0.0
    for off in t["tree_offset"]:
        n = int(off)
        while t["feature"][n] >= 0:
            child = t["left"][n] if x[t["feature"][n]] <= t["threshold"][n] else t["right"][n]
            n = int(off) + int(child)
        s += t["value"][n]
    return s

mos_hs.txt — the same model in native LightGBM format: lightgbm.Booster(model_file="mos_hs.txt").

Both files encode identical trees (verified bit-exact at conversion).

Feature order: m_hs, m_wspd, wdir_sin, wdir_cos, p1_hs, p1_tp, p1_dir_sin, p1_dir_cos, p2_hs, p2_tp, p2_dir_sin, p2_dir_cos, p3_hs, p3_tp, p3_dir_sin, p3_dir_cos, lat, lon, month_sin, month_cos, hour_sin, hour_cos (partition heights normalized as above).

Training data: https://huggingface.co/datasets/phanerozoic/ww3-ndbc-pairs Raw archive: https://huggingface.co/datasets/phanerozoic/noaa-ww3-multi1-points

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support