mespinosami
/

copgen-vaes

@@ -1,5 +1,9 @@
 ---
 license: cc-by-4.0
 tags:
 - earth-observation
 - remote-sensing
@@ -7,9 +11,6 @@ tags:
 - copernicus
 - sentinel
 - multimodal
-library_name: cop-gen
-datasets:
-- Major-TOM/COP-GEN-Benchmark
 ---
 ![copgen-banner-github](https://cdn-uploads.huggingface.co/production/uploads/63ea69a55c837d9968ebecc0/JHy5rYg3WF3y4T01ik2IB.png)
@@ -21,7 +22,7 @@ datasets:
 [![Website](https://img.shields.io/badge/🌐-Website-grey)](https://miquel-espinosa.github.io/cop-gen/)
 [![HF Collection](https://img.shields.io/badge/🤗-Collection-yellow)](https://huggingface.co/collections/mespinosami/copgen)
-This repository contains the suite of modality-specific KL-regularised VAEs used in COP-GEN. Each VAE encodes a distinct Copernicus modality (or band group) into a shared latent space at 8 latent channels. These are prerequisites for both COP-GEN inference and training — the diffusion backbone operates on the latents produced by these encoders.
 ## Model Details
@@ -50,14 +51,7 @@ This repository contains the suite of modality-specific KL-regularised VAEs used
 ## How to Get Started
-Download all VAEs into the expected directory:
-```bash
-git clone https://huggingface.co/mespinosami/copgen-vaes ./models/vae
-rm -rf ./models/vae/.git ./models/vae/.gitattributes
-```
-Each VAE checkpoint is stored as `model-<step>-ema.pt` alongside its config file. The EMA weights have already been extracted into the correct format for inference. To use a VAE directly:
 ```python
 from libs.vae import load_vae
@@ -71,25 +65,10 @@ latents = vae.encode(image_tensor)   # (B, 8, H/f, W/f)
 recon   = vae.decode(latents)
 ```
-See the [GitHub README](https://github.com/miquel-espinosa/COP-GEN) for full encoding instructions for each modality.
 ## Training Details
 Each VAE is trained independently on its respective modality. Inputs are normalised to [-1, 1] using precomputed per-modality min-max statistics (included in the config files). Sentinel-2 data uses a fixed scale factor of 1/1000. Training uses the `accelerate` launcher and supports single- and multi-GPU setups.
-```bash
-# Example: train the S2L2A RGB+NIR VAE
-accelerate launch --num_processes 1 train_vae.py \
-    --cfg configs/vae/final/S2L2A/copgen_ae_kl_192x192_S2L2A_B4_3_2_8_latent_8.yaml \
-    --data_dir ./data/majorTOM/edinburgh/Core-S2L2A
-```
-After training, extract the EMA weights before use with COP-GEN:
-```bash
-python3 scripts/extract_ema_convert_model.py models/vae/<modality>/<config>/model-*.pt
-```
 ## Relationship to COP-GEN
 These VAEs are used in two ways:

 ---
+datasets:
+- Major-TOM/COP-GEN-Benchmark
+library_name: cop-gen
 license: cc-by-4.0
+pipeline_tag: image-to-image
 tags:
 - earth-observation
 - remote-sensing
 - copernicus
 - sentinel
 - multimodal
 ---
 ![copgen-banner-github](https://cdn-uploads.huggingface.co/production/uploads/63ea69a55c837d9968ebecc0/JHy5rYg3WF3y4T01ik2IB.png)
 [![Website](https://img.shields.io/badge/🌐-Website-grey)](https://miquel-espinosa.github.io/cop-gen/)
 [![HF Collection](https://img.shields.io/badge/🤗-Collection-yellow)](https://huggingface.co/collections/mespinosami/copgen)
+This repository contains the suite of modality-specific KL-regularised VAEs used in [COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data](https://arxiv.org/abs/2603.03239). Each VAE encodes a distinct Copernicus modality (or band group) into a shared latent space at 8 latent channels. These are prerequisites for both COP-GEN inference and training — the diffusion backbone operates on the latents produced by these encoders.
 ## Model Details
 ## How to Get Started
+To use a VAE directly for encoding or decoding, you can use the loading logic from the official repository:
 ```python
 from libs.vae import load_vae
 recon   = vae.decode(latents)
 ```
 ## Training Details
 Each VAE is trained independently on its respective modality. Inputs are normalised to [-1, 1] using precomputed per-modality min-max statistics (included in the config files). Sentinel-2 data uses a fixed scale factor of 1/1000. Training uses the `accelerate` launcher and supports single- and multi-GPU setups.
 ## Relationship to COP-GEN
 These VAEs are used in two ways: