aurora-public / docs /batch.md
bidulki-99's picture
Add files using upload-large-folder tool
a310ddc verified

Form of a Batch

You must feed data to the model in the form of a aurora.Batch. We now explain the exact form of aurora.Batch.

Overall Structure

Batches contain four things:

  1. some surface-level variables,
  2. some static variables,
  3. some atmospheric variables all at the same collection of pressure levels, and
  4. metadata describing these variables: latitudes, longitudes, the pressure levels of the atmospheric variables, and the time of the data.

All variables in a batch are unnormalised. Normalisation happens internally in the model.

Before we explain the four components in detail, here is an example with randomly generated data:

from datetime import datetime

import torch

from aurora import Batch, Metadata

batch = Batch(
    surf_vars={k: torch.randn(1, 2, 17, 32) for k in ("2t", "10u", "10v", "msl")},
    static_vars={k: torch.randn(17, 32) for k in ("lsm", "z", "slt")},
    atmos_vars={k: torch.randn(1, 2, 4, 17, 32) for k in ("z", "u", "v", "t", "q")},
    metadata=Metadata(
        lat=torch.linspace(90, -90, 17),
        lon=torch.linspace(0, 360, 32 + 1)[:-1],
        time=(datetime(2020, 6, 1, 12, 0),),
        atmos_levels=(100, 250, 500, 850),
    ),
)

Batch.surf_vars

Batch.surf_vars is a dictionary mapping names of surface-level variables to the numerical values of the variables. The surface-level variables must be of the form (b, t, h, w) where b is the batch size, t the history dimension, h the number of latitudes, and w the number of longitudes.

All Aurora models produce the prediction for the next step from the current and previous step. surf_vars[:, 1, :, :] must correspond to the current step, and surf_vars[:, 0, :, :] must correspond to the previous step, so the step before that.

The following surface-level variables are allowed:

Name Description
2t Two-meter temperature in K
10u Ten-meter eastward wind speed in m/s
10v Ten-meter southward wind speed in m/s
msl Mean sea-level pressure in Pa

For Aurora 0.4° Air Pollution, the following surface-level variables are also allowed:

Name Description
pm1 Particulate matter less than 1 um in kg/m^3
pm2p5 Particulate matter less than 2.5 um in kg/m^3
pm10 Particulate matter less than 10 um in kg/m^3
tcco Total column carbon monoxide in kg/m^2
tc_no Total column nitrogen monoxide in kg/m^2
tcno2 Total column nitrogen dioxide in kg/m^2
tcso2 Total column sulphur dioxide in kg/m^2
gtco3 Total column ozone in kg/m^2

For Aurora 0.25° Wave, the following surface-level variables are also allowed:

Name Description
swh Significant wave height of the total wave in m
mwd Mean wave direction of the total wave in degrees
mwp Mean wave period of the total wave in s
pp1d Peak wave period of the total wave in s
shww Significant wave height of the wind wave component in m
mdww Mean wave direction of the wind wave component in degrees
mpww Mean wave period of the wind wave component in s
shts Significant wave height of the total swell component in m
mdts Mean wave direction of the total swell component in degrees
mpts Mean wave period of the total swell component in s
swh1 Significant wave height of the first swell component in m
mwd1 Mean wave direction of the first swell component in degrees
mwp1 Mean wave period of the first swell component in s
swh2 Significant wave height of the second swell component in m
mwd2 Mean wave direction of the second swell component in degrees
mwp2 Mean wave period of the second swell component in s
wind Ten-meter neutral wind speed in m/s
10u_wind Ten-meter eastward neutral wind speed in m/s
10v_wind Ten-meter southward neutral wind speed in m/s

Batch.static_vars

Batch.static_vars is a dictionary mapping names of static variables to the numerical values of the variables. The static variables must be of the form (h, w) where h is the number of latitudes and w the number of longitudes.

The following static variables are allowed:

Name Description
lsm Land-sea mask
slt Soil type
z Surface-level geopotential in m^2/s^2

Aurora 0.4° Air Pollution and Aurora 0.25° Wave require additional static variables, but these are not easy to obtain yourself. You need to obtain these from the HuggingFace repository. See the description of the models.

Batch.atmos_vars

Batch.atmos_vars is a dictionary mapping names of atmospheric variables to the numerical values of the variables. The atmospheric variables must be of the form (b, t, c, h, w) where b is the batch size, t the history dimension, c the number of pressure levels, h the number of latitudes, and w the number of longitudes. All atmospheric variables must contain the same collection of pressure levels in the same order.

The following atmospheric variables are allowed:

Name Description
t Temperature in K
u Eastward wind speed in m/s
v Southward wind speed in m/s
q Specific humidity in kg/kg
z Geopotential in m^2/s^2

For Aurora 0.4° Air Pollution, the following atmospheric variables are also allowed:

Name Description
co Carbon monoxide in kg/kg
no Nitrogen monoxide in kg/kg
no2 Nitrogen dioxide in kg/kg
so2 Sulphur dioxide in kg/kg
go3 Ozone in kg/kg

Batch.metadata

Batch.metadata must be a Metadata, which contains the following fields:

  • Metadata.lat is the vector of latitudes. The latitudes must be decreasing. The latitudes can either include both endpoints, like linspace(90, -90, 721), or not include the south pole, like linspace(90, -90, 721)[:-1]. For curvilinear grids, this can also be a matrix, in which case the foregoing conditions apply to every column.
  • Metadata.lon is the vector of longitudes. The longitudes must be increasing. The longitudes must be in the range [0, 360), so they can include zero and cannot include 360. For curvilinear grids, this can also be a matrix, in which case the foregoing conditions apply to every row.
  • Metadata.atmos_levels is a tuple of the pressure levels of the atmospheric variables in hPa. Note that these levels must be in exactly correspond to the order of the atmospheric variables. Note also that Metadata.atmos_levels should be a tuple, not a list.
  • Metadata.time is a tuple with, for each batch element, a datetime.datetime representing the time of the data. If the batch size is one, then this will be a one-element tuple, e.g. (datetime(2024, 1, 1, 12, 0),). Since all Aurora models require variables for the current and previous step, Metadata.time corresponds to the time of the current step. Specifically, Metadata.time[i] corresponds to the time of Batch.surf_vars[i, -1].

Model Output

The output of aurora.forward(batch) will again be a Batch. This batch is of exactly the same form, with only one difference: the history dimension will have size one.