English
earth-observation
satellite-imagery
remote-sensing
ikhado commited on
Commit
3619507
·
verified ·
1 Parent(s): 360f3f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +159 -1
README.md CHANGED
@@ -12,4 +12,162 @@ tags:
12
  - earth-observation
13
  - satellite-imagery
14
  - remote-sensing
15
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - earth-observation
13
  - satellite-imagery
14
  - remote-sensing
15
+ ---
16
+
17
+
18
+ # SATtxt - Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery
19
+ <p align="center">
20
+ <img src="https://i.imgur.com/waxVImv.png" alt="SATtxt">
21
+ </p>
22
+
23
+ <p>
24
+ <b> Minh Kha Do, Wei Xiang, Kang Han, Di Wu, Khoa Phan, Yi-Ping Phoebe Chen, Gaowen Liu, Ramana Rao Kompella </b>
25
+ </p>
26
+ <p>
27
+ La Trobe University, Cisco Research
28
+ </p>
29
+
30
+ <p>
31
+ <a href="https://arxiv.org/abs/2602.22613"><img src="https://img.shields.io/badge/arXiv-2602.22613-b31b1b.svg" alt="arXiv"></a>
32
+ <a href="https://huggingface.co/ikhado/sattxt"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-Model-yellow" alt="Hugging Face"></a>
33
+ <a href="https://github.com/ikhado/sattxt"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
34
+ </p>
35
+
36
+
37
+ ---
38
+
39
+ ## 📰 News
40
+
41
+ | Date | Update |
42
+ |------|--------|
43
+ | **Mar 9, 2026** | We have released model code and weights. |
44
+ | **Feb 23, 2026** | SATtxt is accepted at **CVPR 2026**. We appreciate the reviewers and ACs. |
45
+
46
+ ---
47
+
48
+ ## Overview
49
+
50
+ SATtxt is a vision-language foundation model for satellite imagery. We train **only the projection heads**, keeping both encoders frozen.
51
+
52
+ <table>
53
+ <tr><th>Component</th><th>Backbone</th><th>Parameters</th></tr>
54
+ <tr><td>Vision Encoder</td><td><a href="https://github.com/facebookresearch/dinov3">DINOv3</a> ViT-L/16</td><td>Frozen</td></tr>
55
+ <tr><td>Text Encoder</td><td><a href="https://github.com/McGill-NLP/llm2vec">LLM2Vec</a> Llama-3-8B</td><td>Frozen</td></tr>
56
+ <tr><td>Vision Head</td><td>Transformer Projection</td><td>Trained</td></tr>
57
+ <tr><td>Text Head</td><td>Linear Projection</td><td>Trained</td></tr>
58
+ </table>
59
+
60
+ ---
61
+
62
+ ## Installation
63
+
64
+ ```bash
65
+ git clone https://github.com/your-repo/sattxt.git
66
+ cd sattxt
67
+ pip install -r requirements.txt
68
+ pip install flash-attn --no-build-isolation # Required for LLM2Vec
69
+ ```
70
+
71
+ ---
72
+
73
+ ## Model Weights
74
+
75
+ Download the required weights:
76
+
77
+ | Component | Source |
78
+ |-----------|--------|
79
+ | DINOv3 ViT-L/16 | [facebookresearch/dinov3](https://github.com/facebookresearch/dinov3) → `dinov3_vitl16_pretrain_sat493m.pth` |
80
+ | LLM2Vec | [McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse](https://huggingface.co/McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse) |
81
+ | Vision Head | [sattxt_vision_head.pt](https://huggingface.co/ikhado/sattxt/blob/main/sattxt_vision_head.pt) |
82
+ | Text Head | [sattxt_text_head.pt](https://huggingface.co/ikhado/sattxt/blob/main/sattxt_text_head.pt) |
83
+
84
+ Clone DINOv3 into the `thirdparty` folder:
85
+
86
+ ```bash
87
+ cd thirdparty && git clone https://github.com/facebookresearch/dinov3.git
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Quick Start
93
+
94
+ ```python
95
+ import sys
96
+ from pathlib import Path
97
+
98
+ sys.path.insert(0, str(Path(__file__).resolve().parent / "thirdparty" / "dinov3"))
99
+
100
+ from sattxt.model import SATtxt
101
+ from sattxt.utils import image_loader, get_preprocess, zero_shot_classify
102
+
103
+ # Load model
104
+ model = SATtxt(
105
+ dinov3_weights_path='PATH/TO/dinov3_vitl16_pretrain_sat493m.pth',
106
+ sattxt_vision_head_pretrain_weights='PATH/TO/sattxt_vision_head.pt',
107
+ text_encoder_id='McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp',
108
+ sattxt_text_head_pretrain_weights='PATH/TO/sattxt_text_head.pt'
109
+ ).to('cuda').eval()
110
+
111
+ # Zero-shot classification
112
+ categories = ["AnnualCrop", "Forest", "HerbaceousVegetation", "Highway",
113
+ "Industrial", "Pasture", "PermanentCrop", "Residential", "River", "SeaLake"]
114
+
115
+ image = image_loader('./asset/Residential_167.jpg')
116
+ image_tensor = get_preprocess(is_ms=False, all_bands=False)(image).unsqueeze(0).to('cuda')
117
+
118
+ logits, pred_idx = zero_shot_classify(model, image_tensor, categories)
119
+ print(f"Predicted: {categories[pred_idx.item()]}") # Output: Residential
120
+ ```
121
+
122
+ <details>
123
+ <summary><b>Expected Output</b></summary>
124
+
125
+ ```
126
+ Image: ./asset/Residential_167.jpg
127
+ Predicted: Residential
128
+ Confidence scores:
129
+ AnnualCrop: -0.0075
130
+ Forest: -0.0633
131
+ HerbaceousVegetation: -0.0219
132
+ Highway: 0.0283
133
+ Industrial: 0.0887
134
+ Pasture: 0.0178
135
+ PermanentCrop: -0.0197
136
+ Residential: 0.0908
137
+ River: -0.0487
138
+ SeaLake: -0.0441
139
+ ```
140
+
141
+ </details>
142
+
143
+ ---
144
+
145
+ ## Citation
146
+
147
+ ```bibtex
148
+ @misc{do2026sattxt,
149
+ title={Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery},
150
+ author={Minh Kha Do and Wei Xiang and Kang Han and Di Wu and Khoa Phan and Yi-Ping Phoebe Chen and Gaowen Liu and Ramana Rao Kompella},
151
+ year={2026},
152
+ eprint={2602.22613},
153
+ archivePrefix={arXiv},
154
+ primaryClass={cs.CV},
155
+ url={https://arxiv.org/abs/2602.22613},
156
+ }
157
+ ```
158
+
159
+ ---
160
+
161
+ ## Acknowledgements
162
+ We pretrained the model with:
163
+ [Lightning-Hydra-Template](https://github.com/ashleve/lightning-hydra-template)
164
+
165
+ We use evaluation scripts from:
166
+ [MS-CLIP](https://github.com/IBM/MS-CLIP) and [Pangaea-Bench](https://github.com/VMarsocci/pangaea-bench)
167
+
168
+ ---
169
+ <p>
170
+ We welcome contributions and issues to further improve SATtxt.
171
+ </p>
172
+
173
+