TerraVision Tokenizer β€” RGB

A DiVAE (Diffusion VQ-VAE) tokenizer for Aerial RGB imagery (3-channel, uint8) trained on Dutch national geospatial data. Part of the TerraVision-NL project.

Architecture

Component Value
Encoder ViT-B (vit_b_enc)
Decoder Patched UNet (unet_patched)
Quantizer FSQ (codebook: 8-8-8-6-5, vocab: 15,360)
Image size 448Γ—448 px
Patch size 16Γ—16 px
Token grid 28Γ—28 = 784 tokens per image
Input channels 3 (Red, Green, Blue)
Latent dim 5

Geospatial Properties

All TerraVision tokenizers produce the same spatial window in 448 pixels, regardless of the underlying raster resolution. This ensures token grids are spatially aligned across modalities for cross-modal pretraining.

  • Pixel size: 0.08 m
  • Source: Dutch national aerial photography (Beeldmateriaal HRL 2025)

Normalization

Input data should be normalized before encoding:

  • Scheme: standard (clip [0, 255], mean=127.5, std=127.5 β†’ [-1, 1])

See config.json for exact normalization parameters.

Usage

import torch
from huggingface_hub import hf_hub_download
from terratorch.models.backbones.terramind.tokenizer.vqvae import DiVAE

# Download weights and config
weights_path = hf_hub_download(repo_id="YOUR_REPO_ID", filename="tokenizer.pt")

# Instantiate model
tokenizer = DiVAE(
    image_size=448,
    patch_size=16,
    n_channels=3,
    enc_type="vit_b_enc",
    dec_type="unet_patched",
    quant_type="fsq",
    codebook_size="8-8-8-6-5",
    latent_dim=5,
    post_mlp=True,
    norm_codes=True,
)

# Load weights
state_dict = torch.load(weights_path, map_location="cpu")
tokenizer.load_state_dict(state_dict)
tokenizer.eval()

# Encode: image β†’ tokens
x = torch.randn(1, 3, 448, 448)
quant, code_loss, tokens = tokenizer.encode(x)
print(tokens.shape)  # (1, 28, 28)

# Decode: tokens β†’ reconstruction (diffusion sampling)
recon = tokenizer(x, timesteps=50)

Training

Trained with the TerraVision-NL codebase using DiVAE (diffusion-based VQ-VAE) following the TerraMind paper methodology (Section 8.1).

  • Checkpoint: rgb-best-epoch-0076.ckpt
  • Diffusion: 1000 timesteps, linear schedule, predicts sample
Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support