# Form of a Batch You must feed data to the model in the form of a `aurora.Batch`. We now explain the exact form of `aurora.Batch`. ## Overall Structure Batches contain four things: 1. some surface-level variables, 2. some static variables, 3. some atmospheric variables all at the same collection of pressure levels, and 4. metadata describing these variables: latitudes, longitudes, the pressure levels of the atmospheric variables, and the time of the data. All variables in a batch are unnormalised. Normalisation happens internally in the model. Before we explain the four components in detail, here is an example with randomly generated data: ```python from datetime import datetime import torch from aurora import Batch, Metadata batch = Batch( surf_vars={k: torch.randn(1, 2, 17, 32) for k in ("2t", "10u", "10v", "msl")}, static_vars={k: torch.randn(17, 32) for k in ("lsm", "z", "slt")}, atmos_vars={k: torch.randn(1, 2, 4, 17, 32) for k in ("z", "u", "v", "t", "q")}, metadata=Metadata( lat=torch.linspace(90, -90, 17), lon=torch.linspace(0, 360, 32 + 1)[:-1], time=(datetime(2020, 6, 1, 12, 0),), atmos_levels=(100, 250, 500, 850), ), ) ``` ## `Batch.surf_vars` `Batch.surf_vars` is a dictionary mapping names of surface-level variables to the numerical values of the variables. The surface-level variables must be of the form `(b, t, h, w)` where `b` is the batch size, `t` the history dimension, `h` the number of latitudes, and `w` the number of longitudes. All Aurora models produce the prediction for the next step from the current _and_ previous step. `surf_vars[:, 1, :, :]` must correspond to the current step, and `surf_vars[:, 0, :, :]` must correspond to the previous step, so the step before that. The following surface-level variables are allowed: | Name | Description | | - | - | | `2t` | Two-meter temperature in `K` | | `10u` | Ten-meter eastward wind speed in `m/s` | | `10v` | Ten-meter southward wind speed in `m/s` | | `msl` | Mean sea-level pressure in `Pa` | For [Aurora 0.4° Air Pollution](aurora-air-pollution), the following surface-level variables are also allowed: | Name | Description | | - | - | | `pm1` | Particulate matter less than `1 um` in `kg/m^3` | | `pm2p5` | Particulate matter less than `2.5 um` in `kg/m^3` | | `pm10` | Particulate matter less than `10 um` in `kg/m^3` | | `tcco` | Total column carbon monoxide in `kg/m^2` | | `tc_no` | Total column nitrogen monoxide in `kg/m^2` | | `tcno2` | Total column nitrogen dioxide in `kg/m^2` | | `tcso2` | Total column sulphur dioxide in `kg/m^2` | | `gtco3` | Total column ozone in `kg/m^2` | For [Aurora 0.25° Wave](aurora-wave), the following surface-level variables are also allowed: | Name | Description | | - | - | | `swh` | Significant wave height of the total wave in `m` | | `mwd` | Mean wave direction of the total wave in `degrees` | | `mwp` | Mean wave period of the total wave in `s` | | `pp1d` | Peak wave period of the total wave in `s` | | `shww` | Significant wave height of the wind wave component in `m` | | `mdww` | Mean wave direction of the wind wave component in `degrees` | | `mpww` | Mean wave period of the wind wave component in `s` | | `shts` | Significant wave height of the total swell component in `m` | | `mdts` | Mean wave direction of the total swell component in `degrees` | | `mpts` | Mean wave period of the total swell component in `s` | | `swh1` | Significant wave height of the first swell component in `m` | | `mwd1` | Mean wave direction of the first swell component in `degrees` | | `mwp1` | Mean wave period of the first swell component in `s` | | `swh2` | Significant wave height of the second swell component in `m` | | `mwd2` | Mean wave direction of the second swell component in `degrees` | | `mwp2` | Mean wave period of the second swell component in `s` | | `wind` | Ten-meter neutral wind speed in `m/s` | | `10u_wind` | Ten-meter eastward neutral wind speed in `m/s` | | `10v_wind` | Ten-meter southward neutral wind speed in `m/s` | ## `Batch.static_vars` `Batch.static_vars` is a dictionary mapping names of static variables to the numerical values of the variables. The static variables must be of the form `(h, w)` where `h` is the number of latitudes and `w` the number of longitudes. The following static variables are allowed: | Name | Description | | - | - | | `lsm` | [Land-sea mask](https://codes.ecmwf.int/grib/param-db/172) | | `slt` | [Soil type](https://codes.ecmwf.int/grib/param-db/43) | | `z` | Surface-level geopotential in `m^2/s^2` | [Aurora 0.4° Air Pollution](aurora-air-pollution) and [Aurora 0.25° Wave](aurora-wave) require additional static variables, but these are not easy to obtain yourself. You need to obtain these from the HuggingFace repository. See the description of the models. ## `Batch.atmos_vars` `Batch.atmos_vars` is a dictionary mapping names of atmospheric variables to the numerical values of the variables. The atmospheric variables must be of the form `(b, t, c, h, w)` where `b` is the batch size, `t` the history dimension, `c` the number of pressure levels, `h` the number of latitudes, and `w` the number of longitudes. All atmospheric variables must contain the same collection of pressure levels in the same order. The following atmospheric variables are allowed: | Name | Description | | - | - | | `t` | Temperature in `K` | | `u` | Eastward wind speed in `m/s` | | `v` | Southward wind speed in `m/s` | | `q` | Specific humidity in `kg/kg` | | `z` | Geopotential in `m^2/s^2` | For [Aurora 0.4° Air Pollution](aurora-air-pollution), the following atmospheric variables are also allowed: | Name | Description | | - | - | | `co` | Carbon monoxide in `kg/kg` | | `no` | Nitrogen monoxide in `kg/kg` | | `no2` | Nitrogen dioxide in `kg/kg` | | `so2` | Sulphur dioxide in `kg/kg` | | `go3` | Ozone in `kg/kg` | ## `Batch.metadata` `Batch.metadata` must be a `Metadata`, which contains the following fields: * `Metadata.lat` is the vector of latitudes. The latitudes must be _decreasing_. The latitudes can either include both endpoints, like `linspace(90, -90, 721)`, or not include the south pole, like `linspace(90, -90, 721)[:-1]`. For curvilinear grids, this can also be a matrix, in which case the foregoing conditions apply to every _column_. * `Metadata.lon` is the vector of longitudes. The longitudes must be _increasing_. The longitudes must be in the range `[0, 360)`, so they can include zero and cannot include 360. For curvilinear grids, this can also be a matrix, in which case the foregoing conditions apply to every _row_. * `Metadata.atmos_levels` is a `tuple` of the pressure levels of the atmospheric variables in hPa. Note that these levels must be in exactly correspond to the order of the atmospheric variables. Note also that `Metadata.atmos_levels` should be a `tuple`, not a `list`. * `Metadata.time` is a `tuple` with, for each batch element, a `datetime.datetime` representing the time of the data. If the batch size is one, then this will be a one-element `tuple`, e.g. `(datetime(2024, 1, 1, 12, 0),)`. Since all Aurora models require variables for the current _and_ previous step, `Metadata.time` corresponds to the time of the _current_ step. Specifically, `Metadata.time[i]` corresponds to the time of `Batch.surf_vars[i, -1]`. ## Model Output The output of `aurora.forward(batch)` will again be a `Batch`. This batch is of exactly the same form, with only one difference: the history dimension will have size one.