ECCC-ASTD-MRD
/

geml

@@ -26,9 +26,9 @@ These model weights are available under the [Canada Open Government license](htt
 The model predicts the following meteorological variables on a ¼° latitude/longitude grid (with poles):
 * At elevation: tempearture, geopotential, u (zonal) component of wind, v (meridional) component of wind, vertical velocity, specific humidity
-* At surface: temperature (2m), u component of wind (10m), v component of wind (10m), mean sea level pressure, 6hr-accumulated precipitation^(†)
-^(† — This variable is incorrect.  Please see the 'erratum' section.)
 The atmospheric variables are predicted at the 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa pressure levels.  For points that lie below the surface, extrapolated values are given (i.e. they are not masked).
@@ -79,9 +79,9 @@ The pre-training step closely followed the training curriculum of Lam et al.:
 Stage | Batches | Forecast Length | Learning Rate
 |:-:|:-:|:-:|:-:|
-1 (Warmup) | 1000 | 1 step (6 h)  | $0 \to 10^{-3}$ (linear)
-2  | 299000 | 1 step (6 h)  |$10^{-3} \to 3 \cdot 10^{-7}$ (cosine)
-3 | 1000 each | 2–12 steps (12–72 h) | $3 \cdot 10^{-7}$ (constant)
 #### Fune-tuning
@@ -89,14 +89,14 @@ Stage | Batches | Forecast Length | Learning Rate
 Stage | Batches | Forecast Length | Learning Rate
 |:-:|:-:|:-:|:-:|
-Fine tune | 5000 | 12 steps (72 h) | $3 \cdot 10^{-7}$ (constant)
 In both cases, the batch size was 32 forecasts, and the training data was sampled with replacement.  On average, each training forecast (initialization date) was seen about 184 times in the pre-training stage and 4.5 times in the fine-tuning stage.
 #### Optimizer
-As in Lam et al., the training used the AdamW optimizer (Lohchilov and Hutter 2019, [3]), with momentum parameters $\beta_1 = 0.9$ and $\beta_2 = 0.95$ and weight decay of $0.1$ on the weight matrices.  Unlike Lam et al., we did not need to impose gradient clipping for stability.
 ## Validation

 The model predicts the following meteorological variables on a ¼° latitude/longitude grid (with poles):
 * At elevation: tempearture, geopotential, u (zonal) component of wind, v (meridional) component of wind, vertical velocity, specific humidity
+* At surface: temperature (2m), u component of wind (10m), v component of wind (10m), mean sea level pressure, 6hr-accumulated precipitation[†]
+[†] — This variable is incorrect.  Please see the 'erratum' section.
 The atmospheric variables are predicted at the 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa pressure levels.  For points that lie below the surface, extrapolated values are given (i.e. they are not masked).
 Stage | Batches | Forecast Length | Learning Rate
 |:-:|:-:|:-:|:-:|
+1 (Warmup) | 1000 | 1 step (6 h)  | \\(0 \to 10^{-3}\\) (linear)
+2  | 299000 | 1 step (6 h)  | \\(10^{-3} \to 3 \cdot 10^{-7}\\) (cosine)
+3 | 1000 each | 2–12 steps (12–72 h) | \\(3 \cdot 10^{-7}\\) (constant)
 #### Fune-tuning
 Stage | Batches | Forecast Length | Learning Rate
 |:-:|:-:|:-:|:-:|
+Fine tune | 5000 | 12 steps (72 h) | \\(3 \cdot 10^{-7}\\) (constant)
 In both cases, the batch size was 32 forecasts, and the training data was sampled with replacement.  On average, each training forecast (initialization date) was seen about 184 times in the pre-training stage and 4.5 times in the fine-tuning stage.
 #### Optimizer
+As in Lam et al., the training used the AdamW optimizer (Lohchilov and Hutter 2019, [3]), with momentum parameters \\(\beta_1 = 0.9\\) and \\(\beta_2 = 0.95\\) and weight decay of \\(0.1\\) on the weight matrices.  Unlike Lam et al., we did not need to impose gradient clipping for stability.
 ## Validation