hwh-datascience commited on
Commit
29f9b9d
·
verified ·
1 Parent(s): 59e6124

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +94 -3
  2. config.json +28 -0
  3. tokenizer.pt +3 -0
  4. tokenizer.safetensors +3 -0
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - terravision
5
+ - terramind
6
+ - tokenizer
7
+ - vqvae
8
+ - fsq
9
+ - divae
10
+ - geospatial
11
+ - remote-sensing
12
+ - ahn
13
+ library_name: terratorch
14
+ ---
15
+
16
+ # TerraVision Tokenizer — AHN
17
+
18
+ A DiVAE (Diffusion VQ-VAE) tokenizer for **Fused AHN6 DSM + DTM elevation (2-channel, float32)** trained on
19
+ Dutch national geospatial data. Part of the TerraVision-NL project.
20
+
21
+ ## Architecture
22
+
23
+ | Component | Value |
24
+ |-----------|-------|
25
+ | Encoder | ViT-B (vit_b_enc) |
26
+ | Decoder | Patched UNet (unet_patched) |
27
+ | Quantizer | FSQ (codebook: `8-8-8-6-5`, vocab: 15,360) |
28
+ | Image size | 448×448 px |
29
+ | Patch size | 16×16 px |
30
+ | Token grid | 28×28 = 784 tokens per image |
31
+ | Input channels | 2 (Digital Surface Model, Digital Terrain Model) |
32
+ | Latent dim | 5 |
33
+
34
+ ## Geospatial Properties
35
+
36
+ All TerraVision tokenizers produce the **same spatial window** in 448 pixels,
37
+ regardless of the underlying raster resolution. This ensures token grids are spatially aligned
38
+ across modalities for cross-modal pretraining.
39
+
40
+ - **Pixel size**: 0.08 m
41
+ - **Source**: Actueel Hoogtebestand Nederland 6 (AHN6) at 7.5 cm resolution
42
+
43
+ ## Normalization
44
+
45
+ Input data should be normalized before encoding:
46
+ - **Scheme**: minmax (clip [-20, 80] → [0, 1])
47
+
48
+ See `config.json` for exact normalization parameters.
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ import torch
54
+ from huggingface_hub import hf_hub_download
55
+ from terratorch.models.backbones.terramind.tokenizer.vqvae import DiVAE
56
+
57
+ # Download weights and config
58
+ weights_path = hf_hub_download(repo_id="YOUR_REPO_ID", filename="tokenizer.pt")
59
+
60
+ # Instantiate model
61
+ tokenizer = DiVAE(
62
+ image_size=448,
63
+ patch_size=16,
64
+ n_channels=2,
65
+ enc_type="vit_b_enc",
66
+ dec_type="unet_patched",
67
+ quant_type="fsq",
68
+ codebook_size="8-8-8-6-5",
69
+ latent_dim=5,
70
+ post_mlp=True,
71
+ norm_codes=True,
72
+ )
73
+
74
+ # Load weights
75
+ state_dict = torch.load(weights_path, map_location="cpu")
76
+ tokenizer.load_state_dict(state_dict)
77
+ tokenizer.eval()
78
+
79
+ # Encode: image → tokens
80
+ x = torch.randn(1, 2, 448, 448)
81
+ quant, code_loss, tokens = tokenizer.encode(x)
82
+ print(tokens.shape) # (1, 28, 28)
83
+
84
+ # Decode: tokens → reconstruction (diffusion sampling)
85
+ recon = tokenizer(x, timesteps=50)
86
+ ```
87
+
88
+ ## Training
89
+
90
+ Trained with the TerraVision-NL codebase using DiVAE (diffusion-based VQ-VAE)
91
+ following the TerraMind paper methodology (Section 8.1).
92
+
93
+ - **Checkpoint**: `ahn-best-epoch-0002.ckpt`
94
+ - **Diffusion**: 1000 timesteps, linear schedule, predicts sample
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "divae",
3
+ "image_size": 448,
4
+ "patch_size": 16,
5
+ "n_channels": 2,
6
+ "enc_type": "vit_b_enc",
7
+ "dec_type": "unet_patched",
8
+ "quant_type": "fsq",
9
+ "codebook_size": "8-8-8-6-5",
10
+ "latent_dim": 5,
11
+ "commitment_weight": 1.0,
12
+ "post_mlp": true,
13
+ "norm_codes": true,
14
+ "num_train_timesteps": 1000,
15
+ "beta_schedule": "linear",
16
+ "prediction_type": "sample",
17
+ "zero_terminal_snr": true,
18
+ "modality": "ahn",
19
+ "normalization": {
20
+ "kind": "minmax",
21
+ "clip_min": -20.0,
22
+ "clip_max": 80.0
23
+ },
24
+ "geospatial_window_px": 448,
25
+ "token_grid_size": 28,
26
+ "vocab_size": 15360,
27
+ "align_with_pixel_size_m": 0.08
28
+ }
tokenizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b90b7d32696bc9937ecd399d3496579fce61a439ecaec1c6b9b8b399b3e7ab70
3
+ size 1148162699
tokenizer.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41a045228487bc86565aac85253d9c12331c0a2459df80a5ed153c47fe222265
3
+ size 1148037244