TerraVision Tokenizer β RGB
A DiVAE (Diffusion VQ-VAE) tokenizer for Aerial RGB imagery (3-channel, uint8) trained on Dutch national geospatial data. Part of the TerraVision-NL project.
Architecture
| Component | Value |
|---|---|
| Encoder | ViT-B (vit_b_enc) |
| Decoder | Patched UNet (unet_patched) |
| Quantizer | FSQ (codebook: 8-8-8-6-5, vocab: 15,360) |
| Image size | 448Γ448 px |
| Patch size | 16Γ16 px |
| Token grid | 28Γ28 = 784 tokens per image |
| Input channels | 3 (Red, Green, Blue) |
| Latent dim | 5 |
Geospatial Properties
All TerraVision tokenizers produce the same spatial window in 448 pixels, regardless of the underlying raster resolution. This ensures token grids are spatially aligned across modalities for cross-modal pretraining.
- Pixel size: 0.08 m
- Source: Dutch national aerial photography (Beeldmateriaal HRL 2025)
Normalization
Input data should be normalized before encoding:
- Scheme: standard (clip [0, 255], mean=127.5, std=127.5 β [-1, 1])
See config.json for exact normalization parameters.
Usage
import torch
from huggingface_hub import hf_hub_download
from terratorch.models.backbones.terramind.tokenizer.vqvae import DiVAE
# Download weights and config
weights_path = hf_hub_download(repo_id="YOUR_REPO_ID", filename="tokenizer.pt")
# Instantiate model
tokenizer = DiVAE(
image_size=448,
patch_size=16,
n_channels=3,
enc_type="vit_b_enc",
dec_type="unet_patched",
quant_type="fsq",
codebook_size="8-8-8-6-5",
latent_dim=5,
post_mlp=True,
norm_codes=True,
)
# Load weights
state_dict = torch.load(weights_path, map_location="cpu")
tokenizer.load_state_dict(state_dict)
tokenizer.eval()
# Encode: image β tokens
x = torch.randn(1, 3, 448, 448)
quant, code_loss, tokens = tokenizer.encode(x)
print(tokens.shape) # (1, 28, 28)
# Decode: tokens β reconstruction (diffusion sampling)
recon = tokenizer(x, timesteps=50)
Training
Trained with the TerraVision-NL codebase using DiVAE (diffusion-based VQ-VAE) following the TerraMind paper methodology (Section 8.1).
- Checkpoint:
rgb-best-epoch-0076.ckpt - Diffusion: 1000 timesteps, linear schedule, predicts sample
- Downloads last month
- 24
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support