ngeneva commited on
Commit
58cf2e4
·
1 Parent(s): 5444b64

Adding model card

Browse files
README.md ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Earth-2 Checkpoints: HealDA
3
+
4
+ HealDA is a global ML-based data assimilation (DA) model that maps a short window of satellite and conventional observations to a 1° atmospheric state on the Hierarchical Equal Area isoLatitude Pixelation (HEALPix) grid. The resulting analyses serve as plug-and-play initial conditions for off-the-shelf ML forecast models.
5
+
6
+ This model is ready for commercial/non-commercial use.
7
+
8
+ ### License/Terms of Use:
9
+
10
+ **Governing Terms**: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
11
+
12
+ ### Deployment Geography:
13
+
14
+ Global
15
+
16
+ ### Use Case:
17
+
18
+ Industry, academic, and government research teams interested in data assimilation and medium-range weather forecasting.
19
+
20
+ ### Release Date:
21
+
22
+ Hugging Face: 3/16/2026 [URL](https://huggingface.co/nvidia/healda)
23
+
24
+ ## Reference:
25
+
26
+ **Papers**:
27
+
28
+ - [HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts](https://arxiv.org/abs/2601.17636)
29
+
30
+ **Code**:
31
+
32
+ - [PhysicsNeMo](https://github.com/NVIDIA/physicsnemo)
33
+ - [Earth2Studio](https://github.com/NVIDIA/earth2studio)
34
+
35
+ ## Model Architecture
36
+
37
+ **Architecture Type:** Custom Observation Encoder + HPX Vision Transformer (ViT) backbone <br>
38
+ **Network Architecture:** DiT-L adapted to HEALPix grid with patch-based encoding/decoding and global self-attention across the sphere. Used as a deterministic regression model.
39
+
40
+ - Observation Encoder: sensor-specific point-cloud embedders with scatter-reduce aggregation onto HPX64 grid
41
+ - HPX ViT backbone: 330M parameters, 24 transformer blocks, embedding dimension 1024 <br>
42
+
43
+ ## Input:
44
+
45
+ **Input Type(s):**
46
+
47
+ - Tensor (satellite and conventional observations as point-cloud data)
48
+ - Tensor (static conditioning fields: orography, land-sea mask)
49
+ - Tensor (day of year)
50
+ - Tensor (second of day) <br>
51
+
52
+ **Input Format(s):** PyTorch Tensors <br>
53
+ **Input Parameters:**
54
+
55
+ - Observations: variable-length point cloud (~10M scalar observations per 24-hour window)
56
+ - Static conditioning: 4D (batch, channels, time, Npix)
57
+ - Day of year: 2D (batch, time)
58
+ - Second of day: 2D (batch, time)
59
+ - Current checkpoint uses time=1 <br>
60
+
61
+ **Other Properties Related to Input:**
62
+
63
+ - Observation window: 24 hours [t-21h, t+3h] around the target analysis time
64
+ - Microwave sounders: AMSU-A, AMSU-B, ATMS, MHS aboard NOAA-15–20, Metop-A–C, and Suomi-NPP
65
+ - Conventional in-situ observations: surface stations, aircraft, radiosondes, buoys (surface pressure, temperature, humidity, u/v winds)
66
+ - GNSS Radio Occultation: bending angle and derived temperature/humidity profiles
67
+ - Satellite-derived winds: scatterometer (ASCAT) and atmospheric motion vectors (AMVs)
68
+ - Observational data is sourced from the [NOAA UFS Replay](https://psl.noaa.gov/data/ufs_replay/) archive
69
+
70
+ ## Output:
71
+
72
+ **Output Type(s):** Tensor (74-channel atmospheric state) <br>
73
+ **Output Format:** PyTorch Tensor <br>
74
+ **Output Parameters:** 4D (batch, channels, time, Npix) <br>
75
+ **Other Properties Related to Output:**
76
+
77
+ - Output grid: HEALPix HPX64 (Nside=64), 49,152 pixels, ~1° (~100 km) resolution
78
+ - Output is regridded to 0.25° (721x1440) for downstream forecast model initialization
79
+ - Output state variables: `u10m`, `v10m`, `u100m`, `v100m`, `t2m`, `msl`, `tcwv`, `sst`, `sic`,
80
+ `u50`, `u100`, `u150`, `u200`, `u250`, `u300`, `u400`, `u500`, `u600`, `u700`, `u850`,
81
+ `u925`, `u1000`, `v50`, `v100`, `v150`, `v200`, `v250`, `v300`, `v400`, `v500`, `v600`, `v700`, `v850`,
82
+ `v925`, `v1000`, `z50`, `z100`, `z150`, `z200`, `z250`, `z300`, `z400`, `z500`, `z600`, `z700`, `z850`,
83
+ `z925`, `z1000`, `t50`, `t100`, `t150`, `t200`, `t250`, `t300`, `t400`, `t500`, `t600`, `t700`, `t850`,
84
+ `t925`, `t1000`, `q50`, `q100`, `q150`, `q200`, `q250`, `q300`, `q400`, `q500`, `q600`, `q700`, `q850`,
85
+ `q925`, `q1000`
86
+
87
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
88
+
89
+ ## Software Integration
90
+
91
+ **Runtime Engine(s):** PyTorch <br>
92
+ **Supported Hardware Microarchitecture Compatibility:** <br>
93
+ * NVIDIA Ampere <br>
94
+ * NVIDIA Blackwell <br>
95
+ * NVIDIA Hopper <br>
96
+
97
+ **Supported Operating System(s):**
98
+ * Linux <br>
99
+
100
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
101
+
102
+ ## Model Version(s):
103
+
104
+ **Model Version:** v1 <br>
105
+
106
+ # Training, Testing, and Evaluation Datasets:
107
+
108
+ ## Training Dataset:
109
+
110
+ **Link:** [ERA5](https://cds.climate.copernicus.eu/) <br>
111
+
112
+ *Data Collection Method by dataset:* <br>
113
+ * Automatic/Sensors <br>
114
+
115
+ *Labeling Method by dataset:* <br>
116
+ * Automatic/Sensors <br>
117
+
118
+ **Properties:**
119
+ 6-hourly ERA5 reanalysis data for the period 2000–2021, used as the supervised
120
+ training target. ERA5 provides hourly estimates of various atmospheric, land, and
121
+ oceanic climate variables. The data covers the Earth on a 30km grid and resolves the
122
+ atmosphere at 137 levels. <br>
123
+
124
+ **Link:** [UFS Replay](https://psl.noaa.gov/data/ufs_replay/) <br>
125
+
126
+ *Data Collection Method by dataset:* <br>
127
+ * Automatic/Sensors <br>
128
+
129
+ *Labeling Method by dataset:* <br>
130
+ * Automatic/Sensors <br>
131
+
132
+ **Properties:**
133
+ Observational data from the NOAA UFS Replay archive for the period 2000–2021, used
134
+ as model input. The archive contains a wide range of satellite and conventional observations
135
+ used by NOAA operational forecast systems, thinned to 1° spatial resolution by the NOAA GSI. HealDA uses a
136
+ subset of these observations, including microwave sounder radiances (AMSU-A, AMSU-B,
137
+ ATMS, MHS), conventional in-situ measurements (surface stations, aircraft, radiosondes,
138
+ buoys), GNSS radio occultation, and satellite-derived wind products (scatterometer and
139
+ atmospheric motion vectors). <br>
140
+
141
+ #### Data Processing Description:
142
+
143
+ **ERA5** ERA5 data at 0.25 degree resolution on the lat-lon grid is regridded using bilinear interpolation to the HEALPix grid at level 8 (Nside=256), then coarsened by block-averaging to level 6 (Nside=64).
144
+
145
+ **UFS Replay** Raw observation data from the UFS Replay archive (netCDF format) is converted to Parquet format for efficient data loading during training.
146
+
147
+ ## Testing Dataset:
148
+
149
+ **Link:** [ERA5](https://cds.climate.copernicus.eu/) <br>
150
+
151
+ *Data Collection Method by dataset:* <br>
152
+ * Automatic/Sensors <br>
153
+
154
+ *Labeling Method by dataset:* <br>
155
+ * Automatic/Sensors <br>
156
+
157
+ **Properties:**
158
+ ERA5 reanalysis data for the year 2022, used as the verification reference for analysis
159
+ and forecast evaluation. <br>
160
+
161
+ **Link:** [UFS Replay](https://psl.noaa.gov/data/ufs_replay/) <br>
162
+
163
+ *Data Collection Method by dataset:* <br>
164
+ * Automatic/Sensors <br>
165
+
166
+ *Labeling Method by dataset:* <br>
167
+ * Automatic/Sensors <br>
168
+
169
+ **Properties:**
170
+ Observational data from the NOAA UFS Replay archive for the year 2022, used as model
171
+ input during testing. Same observation types as the training dataset. <br>
172
+
173
+ ## Evaluation Dataset:
174
+
175
+ **Link:** [ERA5](https://cds.climate.copernicus.eu/) <br>
176
+
177
+ *Data Collection Method by dataset:* <br>
178
+ * Automatic/Sensors <br>
179
+
180
+ *Labeling Method by dataset:* <br>
181
+ * Automatic/Sensors <br>
182
+
183
+ **Properties:**
184
+ ERA5 reanalysis data for the year 2022. All verification is conducted on the
185
+ HPX64 grid. <br>
186
+
187
+ **Link:** [UFS Replay](https://psl.noaa.gov/data/ufs_replay/) <br>
188
+
189
+ *Data Collection Method by dataset:* <br>
190
+ * Automatic/Sensors <br>
191
+
192
+ *Labeling Method by dataset:* <br>
193
+ * Automatic/Sensors <br>
194
+
195
+ **Properties:**
196
+ Observational data from the NOAA UFS Replay archive for the year 2022, used as model
197
+ input during evaluation. <br>
198
+
199
+ ## Inference:
200
+
201
+ **Acceleration Engine:** PyTorch <br>
202
+ **Test Hardware:**
203
+ * H100 <br>
204
+
205
+
206
+ ## Ethical Considerations:
207
+
208
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
209
+
210
+ For more detailed information on ethical considerations for this model, please see the [Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards](https://huggingface.co/nvidia/healda).
211
+
212
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "name": "healda"
3
+ }
model-card/bias.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ Field | Response
2
+ :---------------------------------------------------------------------------------------------------|:---------------
3
+ Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | Not Applicable
4
+ Measures taken to mitigate against unwanted bias: | Not Applicable
model-card/explainability.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Field | Response
2
+ :------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
3
+ Intended Task/Domain: | Global Global Weather Data Assimilation
4
+ Model Type: | Vision Transformer
5
+ Intended User: | Weather and Climate ML-based researchers / developers implementing on global data assimilation pipelines.
6
+ Output: | Global variables: `u10m`, `v10m`, `u100m`, `v100m`, `t2m`, `msl`, `tcwv`, `sst`, `sic`, `u50`, `u100`, `u150`, `u200`, `u250`, `u300`, `u400`, `u500`, `u600`, `u700`, `u850`, `u925`, `u1000`, `v50`, `v100`, `v150`, `v200`, `v250`, `v300`, `v400`, `v500`, `v600`, `v700`, `v850`, `v925`, `v1000`, `z50`, `z100`, `z150`, `z200`, `z250`, `z300`, `z400`, `z500`, `z600`, `z700`, `z850`, `z925`, `z1000`, `t50`, `t100`, `t150`, `t200`, `t250`, `t300`, `t400`, `t500`, `t600`, `t700`, `t850`, `t925`, `t1000`, `q50`, `q100`, `q150`, `q200`, `q250`, `q300`, `q400`, `q500`, `q600`, `q700`, `q850`, `q925`, `q1000`. |
7
+ Describe how the model works: | Sensor-specific point cloud embedders perform scatter-reduce aggregation onto the HPX64 grid, and a ViT backbone then refines the aggregated features into a global output field. |
8
+ Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
9
+ Technical Limitations & Mitigation: | The model may perform poorly for systems that are not similar to those in the training data, namely for rare weather phenomena or weather behavior outside of the 2000–2021 training dataset. There is no mechanism to enforce physical consistency for predictions.
10
+ Verified to have met prescribed NVIDIA quality standards: | Yes |
11
+ Performance Metrics: | Accuracy, Throughput and Latency
12
+ Potential Known Risks: | This model may incorrectly predict weather states and phenomenon
13
+ Licensing: | [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
model-card/privacy.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Field | Response
2
+ :----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
3
+ Generatable or reverse engineerable personal data? | No
4
+ Personal data used to create this model? | None Known
5
+ Was consent obtained for any personal data used? | Not Applicable
6
+ How often is dataset reviewed? | During dataset creation, model training, evaluation and before release
7
+ Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | No
8
+ Is there provenance for all datasets used in training? | Yes
9
+ Does data labeling (annotation, metadata) comply with privacy laws? | Yes
10
+ Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data.
11
+ Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/
model-card/safety.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Field | Response
2
+ :---------------------------------------------------|:----------------------------------
3
+ Model Application Field(s): | Global Weather Data Assimilation
4
+ Describe the life critical impact (if present). | Not Applicable
5
+ Use Case Restrictions: | Abide by [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
6
+ Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.