corrdiff-cmip6-era5 / README.md
Charlelie's picture
Update README.md (#2)
b22b118 verified
---
{}
---
# Earth-2 Checkpoints: CorrDiff-CMIP6-ERA5
## Description
Corrector Diffusion (CorrDiff) CMIP6-ERA5 model performs a spatio-temporal downscaling
of global climate data comprising several surface, atmospheric, land ice and
sea ice variables from the Coupled Model Intercomparison Project Phase 6
(CMIP6) to the European Reanalysis v5 (ERA5).
The CMIP6 source data consists of daily variables on multiple regular and
curvilinear grids, that are interpolated onto a common 300-km resolution global
climate grid. The model downscales the input CMIP6 data onto hourly 25-km
resolution data. CorrDiff CMIP6-ERA5 allows the prediction of high-fidelity stochastic climate
phenomena over the globe from low-fidelity input data that would otherwise require
expensive global numerical simulations.
CorrDiff CMIP6-ERA5 is a generative spatio-temporal downscaling model trained over
the globe. For details on the CMIP6 grids, see the
[CMIP6](https://wcrp-cmip.org/cmip-phases/cmip6/).
This model is ready for commercial/non-commercial use.
## License/Terms of Use:
Governing Terms: Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
## Deployment Geography:
Global.
## Use Case:
Climate scientists accelerating climate prediction with AI,
financial institutions and insurance companies for climate risk management,
utilities companies for energy planning, and public policy makers for
decision-making.
## Reference(s)
* [CorrDiff Paper](https://arxiv.org/pdf/2309.15214) <br>
* [Coupled Model Intercomparison Project Phase 6](
https://wcrp-cmip.org/cmip-phases/cmip6/) <br>
* [European Reanalysis v5](
https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5) <br>
## Codebase
* [Earth2Studio](https://github.com/NVIDIA/earth2studio) <br>
* [PhysicsNeMo](https://github.com/NVIDIA/physicsnemo) <br>
## Model Architecture
**Architecture Type:** U-Net<br>
**Network Architecture:** Corrector Diffusion U-Net with 158M parameters<br>
## Computational Load
**Cumulative Compute:** 2.3E15 FLOP
**Estimated Energy and Emissions for Model Training:** 4.48266 tCO2e
## Input
**Input Type(s):**
* Tensor (74 Surface, Atmospheric, and Oceanic Variables from the previous,
current, and next day + land-sea mask + elevation + solar zenith angle +
distance to the ocean coastline + sine and cosine of the latitude and
longitude) <br>
* Input data hour of the day in 24-hour format <br>
**Input Format(s):** PyTorch Tensor <br>
**Input Parameters:**
* Four Dimensional (4D) (batch, variable, latitude, longitude) <br>
* Integer (Hour of the day in 24-hour format) <br>
**Other Properties Related to Input:**
* 2.8 degree latitude-longitude grid over the globe
* Input spatial resolution: [64, 128]
* Input temporal resolution: 24 hours
* Latitude Coordinates: [90, 87.2, 84.4, ..., -84.4, -87.2, -90]
* Longitude Coordinates: [0, 2.8, 5.6, ..., 354.4, 357.2, 360]
* Input weather variables: va10, vas, prc, ua10, ta850, rls,
tasmin, wap850, hursmax, ua850, ua50, va850, q10,
rlut, va1000, pr, zg1000, sfcWindmax, hurs, ta50, rsus,
sfcWind, wap10, ta500, ua100, hus1000, zg500,
hus250, ua500, ua1000, hursmin, ta700, va250,
hus700, hus100, ua700, wap100, zg100, ta100,
va500, tas, ua250, wap1000, zg700, va100, rlds,
tasmax, va700, clt, rsds, zg100, ta1000, zg850, uas,
wap700, snc, zg50, wap50, zg250, psl, hus50,
hus850, hus500, siconc, ts<br>
For variable name information, review the Lexicon at [Earth2Studio](https://github.com/NVIDIA/earth2studio/).
## Output
**Output Type(s):** Tensor (75 Surface, Atmospheric, and Oceanic Variables) <br>
**Output Format:** PyTorch Tensor <br>
**Output Parameters:** 5D (batch, samples, variable, latitude, longitude)<br>
**Other Properties Related to Output:**
* 2.8 degree latitude-longitude grid over the globe
* Output spatial resolution: [721, 1440]
* Output temporal resolution: 1 hour
* Output weather variables: u10m, v10m, u100m, v100m, t2m, sp, msl, tcwv, u50, u100,
u150, u200, u250, u300, u400, u500, u600, u700, u850, u925, u1000, v50, v100, v150,
v200, v250, v300, v400, v500, v600, v700, v850, v925, v1000, z50, z100, z150, z200,
z250, z300, z400, z500, z600, z700, z850, z925, z1000, t50, t100, t150, t200, t250,
t300, t400, t500, t600, t700, t850, t925, t1000, q50, q100, q150, q200, q250, q300,
q400, q500, q600, q700, q850, q925, q1000, sst, d2m<br>
## Software Integration
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated
systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software
frameworks (e.g., CUDA libraries), the model achieves faster training and
inference times compared to CPU-only solutions.
**Runtime Engine(s):**
* PyTorch >= 2.4.0 <br>
* PhysicsNeMo >= 1.2.0 <br>
**Supported Hardware Microarchitecture Compatibility:** <br>
* NVIDIA Ampere <br>
* NVIDIA Blackwell <br>
* NVIDIA Hopper <br>
* NVIDIA Turing <br>
**Supported Operating System(s):**
* Linux <br>
## Model Version(s)
**Model version:** v1 <br>
# Training, Testing, and Evaluation Datasets:
The integration of foundation and fine-tuned models into AI systems requires
additional testing using use-case-specific data to ensure safe and effective
deployment. Following the V-model methodology, iterative testing and validation
at both unit and system levels are essential to mitigate risks, meet technical
and functional requirements, and ensure compliance with safety and ethical
standards before deployment.
## Training Dataset
**Link:** [CMIP6](https://wcrp-cmip.org/cmip-phases/cmip6/) <br>
**Data Collection Method by dataset** <br>
* Automatic/Sensors <br>
**Labeling Method by dataset** <br>
* Automatic/Sensors <br>
**Properties (Quantity, Dataset Descriptions, Sensor(s)):**
CMIP6 data for the ranges of 1981-1989, 1991-1999, 2001-2009, 2011-2016. The CMIP6 is a
climate dataset with global coverage of the Earth's atmosphere, ocean, and land. <br>
**Link:** [ERA5](https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5) <br>
**Data Collection Method by dataset** <br>
* Automatic/Sensors <br>
**Labeling Method by dataset** <br>
* Automatic/Sensors <br>
**Properties (Quantity, Dataset Descriptions, Sensor(s)):**
ERA5 data for the date range of 1981-1989, 1991-1999, 2001-2009, 2011-2016. The
ERA5 is a global hourly reanalysis that blends historical observations with a
consistent modern weather model to produce gridded estimates of past
atmospheric conditions. <br>
## Testing Dataset
**Link:** [CMIP6](https://wcrp-cmip.org/cmip-phases/cmip6/) <br>
**Data Collection Method by dataset** <br>
* Automatic/Sensors <br>
**Labeling Method by dataset** <br>
* Automatic/Sensors <br>
**Properties (Quantity, Dataset Descriptions, Sensor(s)):**
CMIP6 data for the years 1980, 1990, 2000. The CMIP6 is a
climate dataset with global coverage of the Earth's atmosphere, ocean, and
land. <br>
**Link:** [ERA5](https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5) <br>
**Data Collection Method by dataset** <br>
* Automatic/Sensors <br>
**Labeling Method by dataset** <br>
* Automatic/Sensors <br>
**Properties (Quantity, Dataset Descriptions, Sensor(s)):**
ERA5 data for the years 1980, 1990, 2000. The ERA5 is a global
hourly reanalysis that blends historical observations with a consistent modern
weather model to produce gridded estimates of past atmospheric conditions. <br>
## Evaluation Dataset
**Link:** [CMIP6](https://wcrp-cmip.org/cmip-phases/cmip6/) <br>
**Data Collection Method by dataset** <br>
* Automatic/Sensors <br>
**Labeling Method by dataset** <br>
* Automatic/Sensors <br>
**Properties (Quantity, Dataset Descriptions, Sensor(s)):**
CMIP6 data for the year 2010. The CMIP6 is a
climate dataset with global coverage of the Earth's atmosphere, ocean, and
land. <br>
**Link:** [ERA5](https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5) <br>
**Data Collection Method by dataset** <br>
* Automatic/Sensors <br>
**Labeling Method by dataset** <br>
* Automatic/Sensors <br>
**Properties (Quantity, Dataset Descriptions, Sensor(s)):**
ERA5 data for the year 2010. The ERA5 is a global
hourly reanalysis that blends historical observations with a consistent modern
weather model to produce gridded estimates of past atmospheric conditions. <br>
# Inference:
**Engine:** [PyTorch](https://github.com/pytorch/pytorch) <br>
**Test Hardware:**
* A100 <br>
* H100 <br>
* L40S <br>
* RTX6000 <br>
## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established
policies and practices to enable development for a wide array of AI applications.
When downloaded or used in accordance with our terms of service, developers should
work with their internal model team to ensure this model meets requirements for the
relevant industry and use case and addresses unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the
Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
Please report model quality, risk, security vulnerabilities or NVIDIA AI
Concerns
[here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).