Add pipeline tag and improve metadata
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,5 +1,9 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
license: cc-by-4.0
|
|
|
|
| 3 |
tags:
|
| 4 |
- earth-observation
|
| 5 |
- remote-sensing
|
|
@@ -7,9 +11,6 @@ tags:
|
|
| 7 |
- copernicus
|
| 8 |
- sentinel
|
| 9 |
- multimodal
|
| 10 |
-
library_name: cop-gen
|
| 11 |
-
datasets:
|
| 12 |
-
- Major-TOM/COP-GEN-Benchmark
|
| 13 |
---
|
| 14 |
|
| 15 |

|
|
@@ -21,7 +22,7 @@ datasets:
|
|
| 21 |
[](https://miquel-espinosa.github.io/cop-gen/)
|
| 22 |
[](https://huggingface.co/collections/mespinosami/copgen)
|
| 23 |
|
| 24 |
-
This repository contains the suite of modality-specific KL-regularised VAEs used in COP-GEN. Each VAE encodes a distinct Copernicus modality (or band group) into a shared latent space at 8 latent channels. These are prerequisites for both COP-GEN inference and training โ the diffusion backbone operates on the latents produced by these encoders.
|
| 25 |
|
| 26 |
## Model Details
|
| 27 |
|
|
@@ -50,14 +51,7 @@ This repository contains the suite of modality-specific KL-regularised VAEs used
|
|
| 50 |
|
| 51 |
## How to Get Started
|
| 52 |
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
```bash
|
| 56 |
-
git clone https://huggingface.co/mespinosami/copgen-vaes ./models/vae
|
| 57 |
-
rm -rf ./models/vae/.git ./models/vae/.gitattributes
|
| 58 |
-
```
|
| 59 |
-
|
| 60 |
-
Each VAE checkpoint is stored as `model-<step>-ema.pt` alongside its config file. The EMA weights have already been extracted into the correct format for inference. To use a VAE directly:
|
| 61 |
|
| 62 |
```python
|
| 63 |
from libs.vae import load_vae
|
|
@@ -71,25 +65,10 @@ latents = vae.encode(image_tensor) # (B, 8, H/f, W/f)
|
|
| 71 |
recon = vae.decode(latents)
|
| 72 |
```
|
| 73 |
|
| 74 |
-
See the [GitHub README](https://github.com/miquel-espinosa/COP-GEN) for full encoding instructions for each modality.
|
| 75 |
-
|
| 76 |
## Training Details
|
| 77 |
|
| 78 |
Each VAE is trained independently on its respective modality. Inputs are normalised to [-1, 1] using precomputed per-modality min-max statistics (included in the config files). Sentinel-2 data uses a fixed scale factor of 1/1000. Training uses the `accelerate` launcher and supports single- and multi-GPU setups.
|
| 79 |
|
| 80 |
-
```bash
|
| 81 |
-
# Example: train the S2L2A RGB+NIR VAE
|
| 82 |
-
accelerate launch --num_processes 1 train_vae.py \
|
| 83 |
-
--cfg configs/vae/final/S2L2A/copgen_ae_kl_192x192_S2L2A_B4_3_2_8_latent_8.yaml \
|
| 84 |
-
--data_dir ./data/majorTOM/edinburgh/Core-S2L2A
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
After training, extract the EMA weights before use with COP-GEN:
|
| 88 |
-
|
| 89 |
-
```bash
|
| 90 |
-
python3 scripts/extract_ema_convert_model.py models/vae/<modality>/<config>/model-*.pt
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
## Relationship to COP-GEN
|
| 94 |
|
| 95 |
These VAEs are used in two ways:
|
|
|
|
| 1 |
---
|
| 2 |
+
datasets:
|
| 3 |
+
- Major-TOM/COP-GEN-Benchmark
|
| 4 |
+
library_name: cop-gen
|
| 5 |
license: cc-by-4.0
|
| 6 |
+
pipeline_tag: image-to-image
|
| 7 |
tags:
|
| 8 |
- earth-observation
|
| 9 |
- remote-sensing
|
|
|
|
| 11 |
- copernicus
|
| 12 |
- sentinel
|
| 13 |
- multimodal
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |

|
|
|
|
| 22 |
[](https://miquel-espinosa.github.io/cop-gen/)
|
| 23 |
[](https://huggingface.co/collections/mespinosami/copgen)
|
| 24 |
|
| 25 |
+
This repository contains the suite of modality-specific KL-regularised VAEs used in [COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data](https://arxiv.org/abs/2603.03239). Each VAE encodes a distinct Copernicus modality (or band group) into a shared latent space at 8 latent channels. These are prerequisites for both COP-GEN inference and training โ the diffusion backbone operates on the latents produced by these encoders.
|
| 26 |
|
| 27 |
## Model Details
|
| 28 |
|
|
|
|
| 51 |
|
| 52 |
## How to Get Started
|
| 53 |
|
| 54 |
+
To use a VAE directly for encoding or decoding, you can use the loading logic from the official repository:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
```python
|
| 57 |
from libs.vae import load_vae
|
|
|
|
| 65 |
recon = vae.decode(latents)
|
| 66 |
```
|
| 67 |
|
|
|
|
|
|
|
| 68 |
## Training Details
|
| 69 |
|
| 70 |
Each VAE is trained independently on its respective modality. Inputs are normalised to [-1, 1] using precomputed per-modality min-max statistics (included in the config files). Sentinel-2 data uses a fixed scale factor of 1/1000. Training uses the `accelerate` launcher and supports single- and multi-GPU setups.
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
## Relationship to COP-GEN
|
| 73 |
|
| 74 |
These VAEs are used in two ways:
|