Adding model card

Browse files

Files changed (6) hide show

README.md +212 -0
config.json +3 -0
model-card/bias.md +4 -0
model-card/explainability.md +13 -0
model-card/privacy.md +11 -0
model-card/safety.md +6 -0

README.md ADDED Viewed

	@@ -0,0 +1,212 @@

+# Earth-2 Checkpoints: HealDA
+HealDA is a global ML-based data assimilation (DA) model that maps a short window of satellite and conventional observations to a 1° atmospheric state on the Hierarchical Equal Area isoLatitude Pixelation (HEALPix) grid. The resulting analyses serve as plug-and-play initial conditions for off-the-shelf ML forecast models.
+This model is ready for commercial/non-commercial use.
+### License/Terms of Use:
+**Governing Terms**: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
+### Deployment Geography:
+Global
+### Use Case:
+Industry, academic, and government research teams interested in data assimilation and medium-range weather forecasting.
+### Release Date:
+Hugging Face: 3/16/2026 [URL](https://huggingface.co/nvidia/healda)
+## Reference:
+**Papers**:
+- [HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts](https://arxiv.org/abs/2601.17636)
+**Code**:
+- [PhysicsNeMo](https://github.com/NVIDIA/physicsnemo)
+- [Earth2Studio](https://github.com/NVIDIA/earth2studio)
+## Model Architecture
+**Architecture Type:** Custom Observation Encoder + HPX Vision Transformer (ViT) backbone <br>
+**Network Architecture:** DiT-L adapted to HEALPix grid with patch-based encoding/decoding and global self-attention across the sphere. Used as a deterministic regression model.
+- Observation Encoder: sensor-specific point-cloud embedders with scatter-reduce aggregation onto HPX64 grid
+- HPX ViT backbone: 330M parameters, 24 transformer blocks, embedding dimension 1024 <br>
+## Input:
+**Input Type(s):**
+- Tensor (satellite and conventional observations as point-cloud data)
+- Tensor (static conditioning fields: orography, land-sea mask)
+- Tensor (day of year)
+- Tensor (second of day) <br>
+**Input Format(s):** PyTorch Tensors <br>
+**Input Parameters:**
+- Observations: variable-length point cloud (~10M scalar observations per 24-hour window)
+- Static conditioning: 4D (batch, channels, time, Npix)
+- Day of year: 2D (batch, time)
+- Second of day: 2D (batch, time)
+- Current checkpoint uses time=1 <br>
+**Other Properties Related to Input:**
+- Observation window: 24 hours [t-21h, t+3h] around the target analysis time
+- Microwave sounders: AMSU-A, AMSU-B, ATMS, MHS aboard NOAA-15–20, Metop-A–C, and Suomi-NPP
+- Conventional in-situ observations: surface stations, aircraft, radiosondes, buoys (surface pressure, temperature, humidity, u/v winds)
+- GNSS Radio Occultation: bending angle and derived temperature/humidity profiles
+- Satellite-derived winds: scatterometer (ASCAT) and atmospheric motion vectors (AMVs)
+- Observational data is sourced from the [NOAA UFS Replay](https://psl.noaa.gov/data/ufs_replay/) archive
+## Output:
+**Output Type(s):** Tensor (74-channel atmospheric state) <br>
+**Output Format:** PyTorch Tensor <br>
+**Output Parameters:** 4D (batch, channels, time, Npix) <br>
+**Other Properties Related to Output:**
+- Output grid: HEALPix HPX64 (Nside=64), 49,152 pixels, ~1° (~100 km) resolution
+- Output is regridded to 0.25° (721x1440) for downstream forecast model initialization
+- Output state variables: `u10m`, `v10m`, `u100m`, `v100m`, `t2m`, `msl`, `tcwv`, `sst`, `sic`,
+`u50`, `u100`, `u150`, `u200`, `u250`, `u300`, `u400`, `u500`, `u600`, `u700`, `u850`,
+`u925`, `u1000`, `v50`, `v100`, `v150`, `v200`, `v250`, `v300`, `v400`, `v500`, `v600`, `v700`, `v850`,
+`v925`, `v1000`, `z50`, `z100`, `z150`, `z200`, `z250`, `z300`, `z400`, `z500`, `z600`, `z700`, `z850`,
+`z925`, `z1000`, `t50`, `t100`, `t150`, `t200`, `t250`, `t300`, `t400`, `t500`, `t600`, `t700`, `t850`,
+`t925`, `t1000`, `q50`, `q100`, `q150`, `q200`, `q250`, `q300`, `q400`, `q500`, `q600`, `q700`, `q850`,
+`q925`, `q1000`
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
+## Software Integration
+**Runtime Engine(s):** PyTorch <br>
+**Supported Hardware Microarchitecture Compatibility:** <br>
+* NVIDIA Ampere <br>
+* NVIDIA Blackwell <br>
+* NVIDIA Hopper <br>
+**Supported Operating System(s):**
+* Linux <br>
+The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
+## Model Version(s):
+**Model Version:** v1 <br>
+# Training, Testing, and Evaluation Datasets:
+## Training Dataset:
+**Link:** [ERA5](https://cds.climate.copernicus.eu/) <br>
+*Data Collection Method by dataset:* <br>
+* Automatic/Sensors <br>
+*Labeling Method by dataset:* <br>
+* Automatic/Sensors <br>
+**Properties:**
+6-hourly ERA5 reanalysis data for the period 2000–2021, used as the supervised
+training target. ERA5 provides hourly estimates of various atmospheric, land, and
+oceanic climate variables. The data covers the Earth on a 30km grid and resolves the
+atmosphere at 137 levels. <br>
+**Link:** [UFS Replay](https://psl.noaa.gov/data/ufs_replay/) <br>
+*Data Collection Method by dataset:* <br>
+* Automatic/Sensors <br>
+*Labeling Method by dataset:* <br>
+* Automatic/Sensors <br>
+**Properties:**
+Observational data from the NOAA UFS Replay archive for the period 2000–2021, used
+as model input. The archive contains a wide range of satellite and conventional observations
+used by NOAA operational forecast systems, thinned to 1° spatial resolution by the NOAA GSI. HealDA uses a
+subset of these observations, including microwave sounder radiances (AMSU-A, AMSU-B,
+ATMS, MHS), conventional in-situ measurements (surface stations, aircraft, radiosondes,
+buoys), GNSS radio occultation, and satellite-derived wind products (scatterometer and
+atmospheric motion vectors). <br>
+#### Data Processing Description:
+**ERA5** ERA5 data at 0.25 degree resolution on the lat-lon grid is regridded using bilinear interpolation to the HEALPix grid at level 8 (Nside=256), then coarsened by block-averaging to level 6 (Nside=64).
+**UFS Replay** Raw observation data from the UFS Replay archive (netCDF format) is converted to Parquet format for efficient data loading during training.
+## Testing Dataset:
+**Link:** [ERA5](https://cds.climate.copernicus.eu/) <br>
+*Data Collection Method by dataset:* <br>
+* Automatic/Sensors <br>
+*Labeling Method by dataset:* <br>
+* Automatic/Sensors <br>
+**Properties:**
+ERA5 reanalysis data for the year 2022, used as the verification reference for analysis
+and forecast evaluation. <br>
+**Link:** [UFS Replay](https://psl.noaa.gov/data/ufs_replay/) <br>
+*Data Collection Method by dataset:* <br>
+* Automatic/Sensors <br>
+*Labeling Method by dataset:* <br>
+* Automatic/Sensors <br>
+**Properties:**
+Observational data from the NOAA UFS Replay archive for the year 2022, used as model
+input during testing. Same observation types as the training dataset. <br>
+## Evaluation Dataset:
+**Link:** [ERA5](https://cds.climate.copernicus.eu/) <br>
+*Data Collection Method by dataset:* <br>
+* Automatic/Sensors <br>
+*Labeling Method by dataset:* <br>
+* Automatic/Sensors <br>
+**Properties:**
+ERA5 reanalysis data for the year 2022. All verification is conducted on the
+HPX64 grid. <br>
+**Link:** [UFS Replay](https://psl.noaa.gov/data/ufs_replay/) <br>
+*Data Collection Method by dataset:* <br>
+* Automatic/Sensors <br>
+*Labeling Method by dataset:* <br>
+* Automatic/Sensors <br>
+**Properties:**
+Observational data from the NOAA UFS Replay archive for the year 2022, used as model
+input during evaluation. <br>
+## Inference:
+**Acceleration Engine:** PyTorch <br>
+**Test Hardware:**
+* H100 <br>
+## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+For more detailed information on ethical considerations for this model, please see the [Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards](https://huggingface.co/nvidia/healda).
+Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+    "name": "healda"
+}

model-card/bias.md ADDED Viewed

	@@ -0,0 +1,4 @@

+Field                                                                                               |  Response
+:---------------------------------------------------------------------------------------------------|:---------------
+Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing:  |  Not Applicable
+Measures taken to mitigate against unwanted bias:                                                   |  Not Applicable

model-card/explainability.md ADDED Viewed

	@@ -0,0 +1,13 @@

+Field                                                                                                  |  Response
+:------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
+Intended Task/Domain:                                                                         |  Global Global Weather Data Assimilation
+Model Type:                                                                                            |  Vision Transformer
+Intended User:                                                                                         |  Weather and Climate ML-based researchers / developers implementing on global data assimilation pipelines.
+Output:                                                                                                |  Global variables: `u10m`, `v10m`, `u100m`, `v100m`, `t2m`, `msl`, `tcwv`, `sst`, `sic`, `u50`, `u100`, `u150`, `u200`, `u250`, `u300`, `u400`, `u500`, `u600`, `u700`, `u850`, `u925`, `u1000`, `v50`, `v100`, `v150`, `v200`, `v250`, `v300`, `v400`, `v500`, `v600`, `v700`, `v850`, `v925`, `v1000`, `z50`, `z100`, `z150`, `z200`, `z250`, `z300`, `z400`, `z500`, `z600`, `z700`, `z850`, `z925`, `z1000`, `t50`, `t100`, `t150`, `t200`, `t250`, `t300`, `t400`, `t500`, `t600`, `t700`, `t850`, `t925`, `t1000`, `q50`, `q100`, `q150`, `q200`, `q250`, `q300`, `q400`, `q500`, `q600`, `q700`, `q850`, `q925`, `q1000`. |
+Describe how the model works:                                                                          |  Sensor-specific point cloud embedders perform scatter-reduce aggregation onto the HPX64 grid, and a ViT backbone then refines the aggregated features into a global output field. |
+Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:  |  Not Applicable
+Technical Limitations & Mitigation:                                                                                 | The model may perform poorly for systems that are not similar to those in the training data, namely for rare weather phenomena or weather behavior outside of the 2000–2021 training dataset. There is no mechanism to enforce physical consistency for predictions.
+Verified to have met prescribed NVIDIA quality standards: | Yes |
+Performance Metrics:                                                                                   |  Accuracy, Throughput and Latency
+Potential Known Risks:                                                                                 |  This model may incorrectly predict weather states and phenomenon
+Licensing:                                                                                             |  [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).

model-card/privacy.md ADDED Viewed

	@@ -0,0 +1,11 @@

+Field                                                                                                                              |  Response
+:----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
+Generatable or reverse engineerable personal data?                                                     |  No
+Personal data used to create this model?                                                                                       |  None Known
+Was consent obtained for any personal data used?                                                                                             |  Not Applicable
+How often is dataset reviewed?                                                                                                     |  During dataset creation, model training, evaluation and before release
+Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? |  No
+Is there provenance for all datasets used in training?                                                                                |  Yes
+Does data labeling (annotation, metadata) comply with privacy laws?                                                                |  Yes
+Is data compliant with data subject requests for data correction or removal, if such a request was made?                           |  No, not possible with externally-sourced data.
+Applicable Privacy Policy        | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/

model-card/safety.md ADDED Viewed

	@@ -0,0 +1,6 @@

+Field                                               |  Response
+:---------------------------------------------------|:----------------------------------
+Model Application Field(s):                               |  Global Weather Data Assimilation
+Describe the life critical impact (if present).   | Not Applicable
+Use Case Restrictions:                              |  Abide by [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
+Model and dataset restrictions:            |  The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development.  Restrictions enforce dataset access during training, and dataset license constraints adhered to.