File size: 8,251 Bytes
2221b11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
---
license: apache-2.0
---
# NVIDIA Model Card

## EDM-Chaos Overview

cBottle

## Description:

cBottle is an diffusion model that generates atmospheric states at kilometer resolution using a cascaded diffusion architecture.

This model is for research and development only.

## License/Terms of Use:

Use of this model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-and-model-evaluation-license/).

## Deployment Geography:

Global

## Use Case:

Researchers and developers in the field of climate modeling and Earth system science would use this model to generate random images of the Earth's atmosphere, leveraging the techniques described in the paper 'Climate in a Bottle: A generative foundation model for the kilometer-scale atmosphere'.

## Release Date:

NGC: 05/12/2025

## Model Architecture:

### Architecture Type:

Convolutional Neural Network (CNN)

### Network Architecture:

Song-UNet on HEALPix geometry

- cBottle-3d has 150M parameters.
- cBottle-video has 282M parameters.
- cBottle-SR has 330M parameters.
- cBottle-3d-tc adds only two convolutional layers to the lowest level of the UNet in cBottle-3d, has 150M parameters.

## Input:

### Input Type(s):

- Tensor (dataset label, one-hot encoded)
- Tensor (day of year)
- Tensor (second of day)
- Tensor (monthly mean sea surface temperature, SST)
- Tensor (Torpical Cyclone location map, at HPX8 resolution, if using TC guidance)


### Input Format(s):

- PyTorch Tensor
- PyTorch Tensor
- PyTorch Tensor
- PyTorch Tensor
- PyTorch Tensor

### Input Parameters:

- Tensor: 2D (batch, time_window)
- Tensor: 2D (batch, time_window)
- Tensor: 4D (batch, num_in_channels, time_window, cell)
- Tensor: 4D (batch, 1024)
- Tensor: 4D (batch, 1024)


### Other Properties Related to Input:

- dataset label. ERA5 = 1, ICON =0. 
- day of year  in days (0-365)
- second of day in seconds (0-86399)
- Monthly mean SST input is on the HEALPix 64 Grid
- TC location map on HPX8 Grid



## Output:

### Output Type(s):

- Tensor

### Output Format:

- PyTorch Tensor

### Output Parameters:

- Four dimensional (batch, channel, time window, cell)

### Other Properties Related to Output:

- Coarse model outputs to HPX 64 grid
- SR model outputs to HPX 1024 grid.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

## Software Integration (Required For NVIDIA Models Only):

### Runtime Engine(s):

PyTorch

### Supported Hardware Microarchitecture Compatibility:

- NVIDIA Ampere
- NVIDIA Blackwell
- NVIDIA Hopper
- NVIDIA Turing

### [Preferred/Supported] Operating System(s):

['Linux']

## Model Version(s):

- **Model version**: v1.2

## Training, Testing, and Evaluation Datasets:

#### Total Number of Datasets:

4 Datasets:

- ICON Cycle 3
- inputs4MIPs
- ERA5
- IBTrACS

##### Dataset Partition:

##### Training:

- ICON: 2020-01-20 03:00:00 through 2024-03-06 12:00:00 (inclusive)
- ERA5/inputs4MIPS/IBTrACS: Years 1980--2017 (inclusive)

##### Validation:

- ICON: 2020-01-20 03:00:00 through 2024-03-06 12:00:00 (inclusive)
- ERA5/inputs4MIPs/IBTrACS: Years 2018

##### Evaluation

- ICON: no independent data withheld for evaluation
- ERA5/inputs4MIPS/IBTrACS: all other years before 1980 and after 2018.

#### Data Processing Description:


**ERA5** For pre-processing, we retrieved hourly ERA5 data for 1980--2018 (inclusive) via the data lake at the National Energy Research Computing Center (NERSC) at 0.25 degree resolution on the lat-lon grid. The 3d atmospheric states are available on pressure levels. We then regridded this using bilinear interpolation to the HEALPix grid with Nside=256. For training the coarse-resolution models, we coarsened ERA5 by pooling to Nside=64 and transformed to zarr format.

**ICON** We obtained O(PB) of ICON data in zarr format from MPI-M. This dataset
was stored on the HEALPix grid with Nside = 1024. Prior to our handling of the data it was
interpolated using nearest neighbors from the native icosahedral grid. Five years
of this data are available at 3-hour (3D) and 30-minute (2D) resolution in time.
Conveniently, this data-set featured pre-coarsened data (again using averaging
pooling) at our coarse-resolution of Nside = 64. We interpolated this coarse
data to fixed pressure levels using linear interpolation in the vertical direction.
To fill in values at pressure level locations that are below the surface, we use
extrapolation based on hydrostatic balance constraints and an assumption of a
constant temperature lapse rate of 6.5 K/km for temperature and geo-potential.
For all other variables, we use constant extrapolation down from the surface.
This procedure approximately reproduces ERA5’s undocumented procedure for
filling-in below-surface levels. 

#### Public Datasets:

- ERA5
- ICON
- inputs4MIPs (tosbcs)
- IBTrACS

#### Training, Testing, and Evaluation Datasets:

The following datasets were used for Training, Testing, and Evaluation:

#### Link:

- ERA5: https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
- ICON Cycle 3: https://www.wdc-climate.de/ui/entry?acronym=nextGEMS_cyc3
- inputs4MIP: https://pcmdi.llnl.gov/mips/input4MIPs/ 
- IBTrACS: https://www.ncei.noaa.gov/products/international-best-track-archive

#### Data Collection Method by dataset

Automatic/sensors

#### Labeling Method by dataset

Automatic/sensors

#### Properties (Quantity, Dataset Descriptions, Sensor(s)):

**ERA**

ERA5 is a commonly used dataset that represent the best guess of the global atmospheric state at a coarse resolution of 25 km. Briefly, it is created by running a data assimilation algorithm that incorporates observations into the European Center for Medium Range Weather Forecasting's (ECMWF) numerical model. However, it is not available at a high enough resolution in space or time to resolve meso-scale motions on kilometer scale, and in particular deep convection. The ICON simulation fills in this gap since it is available at a global 5km resolution, but is not paired with reality. See Tab. \ref{tab:datasets} for a detailed overview of the datasets. We retreived data from 1980--2018.


**ICON Cycle 3**
The ICON data is a free-running simulation with the ICON atmospheric model coupled to a dynamic ocean and land. The atmospheric component solves the nonhydrostatic fluid mechanics equations on a global ico-sahedral mesh with a resolution of around 5 km. Unlike ERA5 the model explicitly resolves certain convective motions so a convection parameterization is not used. So even beyond the simple increase in resolution, we expect this to increase the fidelity of the precipitation and cloud fields in this dataset relative to ERA5 where these processes are parameterized. Furthermore, because the ocean and land are dynamically coupled to the atmosphere, we expect this run to obey conservation of heat, momentum and moisture between these different components, unlike in the ERA5 where the observations (rather than conservations) are king.

#### Testing Dataset:

Same details as Training Dataset. This period was reserved for tuning the model hyperparameters

- ICON: 2020-01-20 03:00:00 through 2024-03-06 12:00:00 (inclusive)
- ERA5/inputs4MIPs: Years 2018


#### Evaluation Dataset:

- ICON: no independent data withheld for evaluation
- ERA5/inputs4MIPS: all other years before 1980 and after 2018.

## Inference:

### Engine:

PyTorch

### Test Hardware:

- A100, H100


## Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).