Graph Machine Learning
geml
graphcast
weather
csubich commited on
Commit
31156e0
·
verified ·
1 Parent(s): ec56f9e

Use correct formatting for inline math in readme

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -26,9 +26,9 @@ These model weights are available under the [Canada Open Government license](htt
26
  The model predicts the following meteorological variables on a ¼° latitude/longitude grid (with poles):
27
 
28
  * At elevation: tempearture, geopotential, u (zonal) component of wind, v (meridional) component of wind, vertical velocity, specific humidity
29
- * At surface: temperature (2m), u component of wind (10m), v component of wind (10m), mean sea level pressure, 6hr-accumulated precipitation^()
30
 
31
- ^(† — This variable is incorrect. Please see the 'erratum' section.)
32
 
33
  The atmospheric variables are predicted at the 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa pressure levels. For points that lie below the surface, extrapolated values are given (i.e. they are not masked).
34
 
@@ -79,9 +79,9 @@ The pre-training step closely followed the training curriculum of Lam et al.:
79
 
80
  Stage | Batches | Forecast Length | Learning Rate
81
  |:-:|:-:|:-:|:-:|
82
- 1 (Warmup) | 1000 | 1 step (6 h) | $0 \to 10^{-3}$ (linear)
83
- 2 | 299000 | 1 step (6 h) |$10^{-3} \to 3 \cdot 10^{-7}$ (cosine)
84
- 3 | 1000 each | 2–12 steps (12–72 h) | $3 \cdot 10^{-7}$ (constant)
85
 
86
  #### Fune-tuning
87
 
@@ -89,14 +89,14 @@ Stage | Batches | Forecast Length | Learning Rate
89
 
90
  Stage | Batches | Forecast Length | Learning Rate
91
  |:-:|:-:|:-:|:-:|
92
- Fine tune | 5000 | 12 steps (72 h) | $3 \cdot 10^{-7}$ (constant)
93
 
94
  In both cases, the batch size was 32 forecasts, and the training data was sampled with replacement. On average, each training forecast (initialization date) was seen about 184 times in the pre-training stage and 4.5 times in the fine-tuning stage.
95
 
96
 
97
  #### Optimizer
98
 
99
- As in Lam et al., the training used the AdamW optimizer (Lohchilov and Hutter 2019, [3]), with momentum parameters $\beta_1 = 0.9$ and $\beta_2 = 0.95$ and weight decay of $0.1$ on the weight matrices. Unlike Lam et al., we did not need to impose gradient clipping for stability.
100
 
101
  ## Validation
102
 
 
26
  The model predicts the following meteorological variables on a ¼° latitude/longitude grid (with poles):
27
 
28
  * At elevation: tempearture, geopotential, u (zonal) component of wind, v (meridional) component of wind, vertical velocity, specific humidity
29
+ * At surface: temperature (2m), u component of wind (10m), v component of wind (10m), mean sea level pressure, 6hr-accumulated precipitation[]
30
 
31
+ [] — This variable is incorrect. Please see the 'erratum' section.
32
 
33
  The atmospheric variables are predicted at the 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa pressure levels. For points that lie below the surface, extrapolated values are given (i.e. they are not masked).
34
 
 
79
 
80
  Stage | Batches | Forecast Length | Learning Rate
81
  |:-:|:-:|:-:|:-:|
82
+ 1 (Warmup) | 1000 | 1 step (6 h) | \\(0 \to 10^{-3}\\) (linear)
83
+ 2 | 299000 | 1 step (6 h) | \\(10^{-3} \to 3 \cdot 10^{-7}\\) (cosine)
84
+ 3 | 1000 each | 2–12 steps (12–72 h) | \\(3 \cdot 10^{-7}\\) (constant)
85
 
86
  #### Fune-tuning
87
 
 
89
 
90
  Stage | Batches | Forecast Length | Learning Rate
91
  |:-:|:-:|:-:|:-:|
92
+ Fine tune | 5000 | 12 steps (72 h) | \\(3 \cdot 10^{-7}\\) (constant)
93
 
94
  In both cases, the batch size was 32 forecasts, and the training data was sampled with replacement. On average, each training forecast (initialization date) was seen about 184 times in the pre-training stage and 4.5 times in the fine-tuning stage.
95
 
96
 
97
  #### Optimizer
98
 
99
+ As in Lam et al., the training used the AdamW optimizer (Lohchilov and Hutter 2019, [3]), with momentum parameters \\(\beta_1 = 0.9\\) and \\(\beta_2 = 0.95\\) and weight decay of \\(0.1\\) on the weight matrices. Unlike Lam et al., we did not need to impose gradient clipping for stability.
100
 
101
  ## Validation
102