--- license: apache-2.0 tags: - terravision - terramind - tokenizer - vqvae - fsq - divae - geospatial - remote-sensing - ahn library_name: terratorch --- # TerraVision Tokenizer — AHN A DiVAE (Diffusion VQ-VAE) tokenizer for **Fused AHN6 DSM + DTM elevation (2-channel, float32)** trained on Dutch national geospatial data. Part of the TerraVision-NL project. ## Architecture | Component | Value | |-----------|-------| | Encoder | ViT-B (vit_b_enc) | | Decoder | Patched UNet (unet_patched) | | Quantizer | FSQ (codebook: `8-8-8-6-5`, vocab: 15,360) | | Image size | 448×448 px | | Patch size | 16×16 px | | Token grid | 28×28 = 784 tokens per image | | Input channels | 2 (Digital Surface Model, Digital Terrain Model) | | Latent dim | 5 | ## Geospatial Properties All TerraVision tokenizers produce the **same spatial window** in 448 pixels, regardless of the underlying raster resolution. This ensures token grids are spatially aligned across modalities for cross-modal pretraining. - **Pixel size**: 0.08 m - **Source**: Actueel Hoogtebestand Nederland 6 (AHN6) at 7.5 cm resolution ## Normalization Input data should be normalized before encoding: - **Scheme**: minmax (clip [-20, 80] → [0, 1]) See `config.json` for exact normalization parameters. ## Usage ```python import torch from huggingface_hub import hf_hub_download from terratorch.models.backbones.terramind.tokenizer.vqvae import DiVAE # Download weights and config weights_path = hf_hub_download(repo_id="YOUR_REPO_ID", filename="tokenizer.pt") # Instantiate model tokenizer = DiVAE( image_size=448, patch_size=16, n_channels=2, enc_type="vit_b_enc", dec_type="unet_patched", quant_type="fsq", codebook_size="8-8-8-6-5", latent_dim=5, post_mlp=True, norm_codes=True, ) # Load weights state_dict = torch.load(weights_path, map_location="cpu") tokenizer.load_state_dict(state_dict) tokenizer.eval() # Encode: image → tokens x = torch.randn(1, 2, 448, 448) quant, code_loss, tokens = tokenizer.encode(x) print(tokens.shape) # (1, 28, 28) # Decode: tokens → reconstruction (diffusion sampling) recon = tokenizer(x, timesteps=50) ``` ## Training Trained with the TerraVision-NL codebase using DiVAE (diffusion-based VQ-VAE) following the TerraMind paper methodology (Section 8.1). - **Checkpoint**: `ahn-best-epoch-0002.ckpt` - **Diffusion**: 1000 timesteps, linear schedule, predicts sample