Upload 4 files

Browse files

Files changed (4) hide show

README.md +94 -3
config.json +28 -0
tokenizer.pt +3 -0
tokenizer.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,94 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+tags:
+- terravision
+- terramind
+- tokenizer
+- vqvae
+- fsq
+- divae
+- geospatial
+- remote-sensing
+- ahn
+library_name: terratorch
+---
+# TerraVision Tokenizer — AHN
+A DiVAE (Diffusion VQ-VAE) tokenizer for **Fused AHN6 DSM + DTM elevation (2-channel, float32)** trained on
+Dutch national geospatial data. Part of the TerraVision-NL project.
+## Architecture
+| Component | Value |
+|-----------|-------|
+| Encoder | ViT-B (vit_b_enc) |
+| Decoder | Patched UNet (unet_patched) |
+| Quantizer | FSQ (codebook: `8-8-8-6-5`, vocab: 15,360) |
+| Image size | 448×448 px |
+| Patch size | 16×16 px |
+| Token grid | 28×28 = 784 tokens per image |
+| Input channels | 2 (Digital Surface Model, Digital Terrain Model) |
+| Latent dim | 5 |
+## Geospatial Properties
+All TerraVision tokenizers produce the **same spatial window** in 448 pixels,
+regardless of the underlying raster resolution. This ensures token grids are spatially aligned
+across modalities for cross-modal pretraining.
+- **Pixel size**: 0.08 m
+- **Source**: Actueel Hoogtebestand Nederland 6 (AHN6) at 7.5 cm resolution
+## Normalization
+Input data should be normalized before encoding:
+- **Scheme**: minmax (clip [-20, 80] → [0, 1])
+See `config.json` for exact normalization parameters.
+## Usage
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from terratorch.models.backbones.terramind.tokenizer.vqvae import DiVAE
+# Download weights and config
+weights_path = hf_hub_download(repo_id="YOUR_REPO_ID", filename="tokenizer.pt")
+# Instantiate model
+tokenizer = DiVAE(
+    image_size=448,
+    patch_size=16,
+    n_channels=2,
+    enc_type="vit_b_enc",
+    dec_type="unet_patched",
+    quant_type="fsq",
+    codebook_size="8-8-8-6-5",
+    latent_dim=5,
+    post_mlp=True,
+    norm_codes=True,
+)
+# Load weights
+state_dict = torch.load(weights_path, map_location="cpu")
+tokenizer.load_state_dict(state_dict)
+tokenizer.eval()
+# Encode: image → tokens
+x = torch.randn(1, 2, 448, 448)
+quant, code_loss, tokens = tokenizer.encode(x)
+print(tokens.shape)  # (1, 28, 28)
+# Decode: tokens → reconstruction (diffusion sampling)
+recon = tokenizer(x, timesteps=50)
+```
+## Training
+Trained with the TerraVision-NL codebase using DiVAE (diffusion-based VQ-VAE)
+following the TerraMind paper methodology (Section 8.1).
+- **Checkpoint**: `ahn-best-epoch-0002.ckpt`
+- **Diffusion**: 1000 timesteps, linear schedule, predicts sample

config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "model_type": "divae",
+  "image_size": 448,
+  "patch_size": 16,
+  "n_channels": 2,
+  "enc_type": "vit_b_enc",
+  "dec_type": "unet_patched",
+  "quant_type": "fsq",
+  "codebook_size": "8-8-8-6-5",
+  "latent_dim": 5,
+  "commitment_weight": 1.0,
+  "post_mlp": true,
+  "norm_codes": true,
+  "num_train_timesteps": 1000,
+  "beta_schedule": "linear",
+  "prediction_type": "sample",
+  "zero_terminal_snr": true,
+  "modality": "ahn",
+  "normalization": {
+    "kind": "minmax",
+    "clip_min": -20.0,
+    "clip_max": 80.0
+  },
+  "geospatial_window_px": 448,
+  "token_grid_size": 28,
+  "vocab_size": 15360,
+  "align_with_pixel_size_m": 0.08
+}

tokenizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b90b7d32696bc9937ecd399d3496579fce61a439ecaec1c6b9b8b399b3e7ab70
+size 1148162699

tokenizer.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:41a045228487bc86565aac85253d9c12331c0a2459df80a5ed153c47fe222265
+size 1148037244