aurora-public / docs /batch.md
bidulki-99's picture
Add files using upload-large-folder tool
a310ddc verified
# Form of a Batch
You must feed data to the model in the form of a `aurora.Batch`.
We now explain the exact form of `aurora.Batch`.
## Overall Structure
Batches contain four things:
1. some surface-level variables,
2. some static variables,
3. some atmospheric variables all at the same collection of pressure levels, and
4. metadata describing these variables: latitudes, longitudes,
the pressure levels of the atmospheric variables, and the time of the data.
All variables in a batch are unnormalised.
Normalisation happens internally in the model.
Before we explain the four components in detail, here is an example with randomly generated data:
```python
from datetime import datetime
import torch
from aurora import Batch, Metadata
batch = Batch(
surf_vars={k: torch.randn(1, 2, 17, 32) for k in ("2t", "10u", "10v", "msl")},
static_vars={k: torch.randn(17, 32) for k in ("lsm", "z", "slt")},
atmos_vars={k: torch.randn(1, 2, 4, 17, 32) for k in ("z", "u", "v", "t", "q")},
metadata=Metadata(
lat=torch.linspace(90, -90, 17),
lon=torch.linspace(0, 360, 32 + 1)[:-1],
time=(datetime(2020, 6, 1, 12, 0),),
atmos_levels=(100, 250, 500, 850),
),
)
```
## `Batch.surf_vars`
`Batch.surf_vars` is a dictionary mapping names of surface-level variables to the numerical values
of the variables.
The surface-level variables must be of the form `(b, t, h, w)` where `b` is the batch size,
`t` the history dimension, `h` the number of latitudes, and `w` the number of longitudes.
All Aurora models produce the prediction for the next step from the current _and_ previous step.
`surf_vars[:, 1, :, :]` must correspond to the current step,
and `surf_vars[:, 0, :, :]` must correspond to the previous step, so the step before that.
The following surface-level variables are allowed:
| Name | Description |
| - | - |
| `2t` | Two-meter temperature in `K` |
| `10u` | Ten-meter eastward wind speed in `m/s` |
| `10v` | Ten-meter southward wind speed in `m/s` |
| `msl` | Mean sea-level pressure in `Pa` |
For [Aurora 0.4° Air Pollution](aurora-air-pollution), the following surface-level variables are
also allowed:
| Name | Description |
| - | - |
| `pm1` | Particulate matter less than `1 um` in `kg/m^3` |
| `pm2p5` | Particulate matter less than `2.5 um` in `kg/m^3` |
| `pm10` | Particulate matter less than `10 um` in `kg/m^3` |
| `tcco` | Total column carbon monoxide in `kg/m^2` |
| `tc_no` | Total column nitrogen monoxide in `kg/m^2` |
| `tcno2` | Total column nitrogen dioxide in `kg/m^2` |
| `tcso2` | Total column sulphur dioxide in `kg/m^2` |
| `gtco3` | Total column ozone in `kg/m^2` |
For [Aurora 0.25° Wave](aurora-wave), the following surface-level variables are also allowed:
| Name | Description |
| - | - |
| `swh` | Significant wave height of the total wave in `m` |
| `mwd` | Mean wave direction of the total wave in `degrees` |
| `mwp` | Mean wave period of the total wave in `s` |
| `pp1d` | Peak wave period of the total wave in `s` |
| `shww` | Significant wave height of the wind wave component in `m` |
| `mdww` | Mean wave direction of the wind wave component in `degrees` |
| `mpww` | Mean wave period of the wind wave component in `s` |
| `shts` | Significant wave height of the total swell component in `m` |
| `mdts` | Mean wave direction of the total swell component in `degrees` |
| `mpts` | Mean wave period of the total swell component in `s` |
| `swh1` | Significant wave height of the first swell component in `m` |
| `mwd1` | Mean wave direction of the first swell component in `degrees` |
| `mwp1` | Mean wave period of the first swell component in `s` |
| `swh2` | Significant wave height of the second swell component in `m` |
| `mwd2` | Mean wave direction of the second swell component in `degrees` |
| `mwp2` | Mean wave period of the second swell component in `s` |
| `wind` | Ten-meter neutral wind speed in `m/s` |
| `10u_wind` | Ten-meter eastward neutral wind speed in `m/s` |
| `10v_wind` | Ten-meter southward neutral wind speed in `m/s` |
## `Batch.static_vars`
`Batch.static_vars` is a dictionary mapping names of static variables to the
numerical values of the variables.
The static variables must be of the form `(h, w)` where `h` is the number of latitudes
and `w` the number of longitudes.
The following static variables are allowed:
| Name | Description |
| - | - |
| `lsm` | [Land-sea mask](https://codes.ecmwf.int/grib/param-db/172) |
| `slt` | [Soil type](https://codes.ecmwf.int/grib/param-db/43) |
| `z` | Surface-level geopotential in `m^2/s^2` |
[Aurora 0.4° Air Pollution](aurora-air-pollution)
and [Aurora 0.25° Wave](aurora-wave) require additional static variables, but these are not
easy to obtain yourself.
You need to obtain these from the HuggingFace repository.
See the description of the models.
## `Batch.atmos_vars`
`Batch.atmos_vars` is a dictionary mapping names of atmospheric variables to the
numerical values of the variables.
The atmospheric variables must be of the form `(b, t, c, h, w)` where `b` is the batch size,
`t` the history dimension, `c` the number of pressure levels, `h` the number of latitudes,
and `w` the number of longitudes.
All atmospheric variables must contain the same collection of pressure levels in the same order.
The following atmospheric variables are allowed:
| Name | Description |
| - | - |
| `t` | Temperature in `K` |
| `u` | Eastward wind speed in `m/s` |
| `v` | Southward wind speed in `m/s` |
| `q` | Specific humidity in `kg/kg` |
| `z` | Geopotential in `m^2/s^2` |
For [Aurora 0.4° Air Pollution](aurora-air-pollution), the following atmospheric variables are
also allowed:
| Name | Description |
| - | - |
| `co` | Carbon monoxide in `kg/kg` |
| `no` | Nitrogen monoxide in `kg/kg` |
| `no2` | Nitrogen dioxide in `kg/kg` |
| `so2` | Sulphur dioxide in `kg/kg` |
| `go3` | Ozone in `kg/kg` |
## `Batch.metadata`
`Batch.metadata` must be a `Metadata`, which contains the following fields:
* `Metadata.lat` is the vector of latitudes.
The latitudes must be _decreasing_.
The latitudes can either include both endpoints, like `linspace(90, -90, 721)`,
or not include the south pole, like `linspace(90, -90, 721)[:-1]`.
For curvilinear grids, this can also be a matrix, in which case the foregoing conditions
apply to every _column_.
* `Metadata.lon` is the vector of longitudes.
The longitudes must be _increasing_.
The longitudes must be in the range `[0, 360)`, so they can include zero and cannot include 360.
For curvilinear grids, this can also be a matrix, in which case the foregoing conditions
apply to every _row_.
* `Metadata.atmos_levels` is a `tuple` of the pressure levels of the atmospheric variables in hPa.
Note that these levels must be in exactly correspond to the order of the atmospheric variables.
Note also that `Metadata.atmos_levels` should be a `tuple`, not a `list`.
* `Metadata.time` is a `tuple` with, for each batch element, a `datetime.datetime` representing the time of the data.
If the batch size is one, then this will be a one-element `tuple`, e.g. `(datetime(2024, 1, 1, 12, 0),)`.
Since all Aurora models require variables for the current _and_ previous step,
`Metadata.time` corresponds to the time of the _current_ step.
Specifically, `Metadata.time[i]` corresponds to the time of `Batch.surf_vars[i, -1]`.
## Model Output
The output of `aurora.forward(batch)` will again be a `Batch`.
This batch is of exactly the same form, with only one difference:
the history dimension will have size one.