English
earth-observation
satellite-imagery
remote-sensing
ikhado commited on
Commit
2dccfc8
·
verified ·
1 Parent(s): 3619507

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -60
README.md CHANGED
@@ -13,9 +13,8 @@ tags:
13
  - satellite-imagery
14
  - remote-sensing
15
  ---
16
-
17
-
18
  # SATtxt - Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery
 
19
  <p align="center">
20
  <img src="https://i.imgur.com/waxVImv.png" alt="SATtxt">
21
  </p>
@@ -33,22 +32,10 @@ tags:
33
  <a href="https://github.com/ikhado/sattxt"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
34
  </p>
35
 
36
-
37
- ---
38
-
39
- ## 📰 News
40
-
41
- | Date | Update |
42
- |------|--------|
43
- | **Mar 9, 2026** | We have released model code and weights. |
44
- | **Feb 23, 2026** | SATtxt is accepted at **CVPR 2026**. We appreciate the reviewers and ACs. |
45
-
46
  ---
47
 
48
  ## Overview
49
-
50
  SATtxt is a vision-language foundation model for satellite imagery. We train **only the projection heads**, keeping both encoders frozen.
51
-
52
  <table>
53
  <tr><th>Component</th><th>Backbone</th><th>Parameters</th></tr>
54
  <tr><td>Vision Encoder</td><td><a href="https://github.com/facebookresearch/dinov3">DINOv3</a> ViT-L/16</td><td>Frozen</td></tr>
@@ -56,24 +43,18 @@ SATtxt is a vision-language foundation model for satellite imagery. We train **o
56
  <tr><td>Vision Head</td><td>Transformer Projection</td><td>Trained</td></tr>
57
  <tr><td>Text Head</td><td>Linear Projection</td><td>Trained</td></tr>
58
  </table>
59
-
60
  ---
61
-
62
  ## Installation
63
 
64
  ```bash
65
- git clone https://github.com/your-repo/sattxt.git
66
  cd sattxt
67
  pip install -r requirements.txt
68
  pip install flash-attn --no-build-isolation # Required for LLM2Vec
69
  ```
70
-
71
  ---
72
-
73
  ## Model Weights
74
-
75
  Download the required weights:
76
-
77
  | Component | Source |
78
  |-----------|--------|
79
  | DINOv3 ViT-L/16 | [facebookresearch/dinov3](https://github.com/facebookresearch/dinov3) → `dinov3_vitl16_pretrain_sat493m.pth` |
@@ -82,68 +63,49 @@ Download the required weights:
82
  | Text Head | [sattxt_text_head.pt](https://huggingface.co/ikhado/sattxt/blob/main/sattxt_text_head.pt) |
83
 
84
  Clone DINOv3 into the `thirdparty` folder:
85
-
86
  ```bash
87
  cd thirdparty && git clone https://github.com/facebookresearch/dinov3.git
88
  ```
89
 
90
  ---
91
-
92
  ## Quick Start
93
 
94
  ```python
95
  import sys
96
  from pathlib import Path
97
 
 
 
98
  sys.path.insert(0, str(Path(__file__).resolve().parent / "thirdparty" / "dinov3"))
99
 
100
  from sattxt.model import SATtxt
101
  from sattxt.utils import image_loader, get_preprocess, zero_shot_classify
 
102
 
103
- # Load model
104
  model = SATtxt(
105
- dinov3_weights_path='PATH/TO/dinov3_vitl16_pretrain_sat493m.pth',
106
- sattxt_vision_head_pretrain_weights='PATH/TO/sattxt_vision_head.pt',
107
- text_encoder_id='McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp',
108
- sattxt_text_head_pretrain_weights='PATH/TO/sattxt_text_head.pt'
109
- ).to('cuda').eval()
110
 
111
- # Zero-shot classification
112
- categories = ["AnnualCrop", "Forest", "HerbaceousVegetation", "Highway",
113
- "Industrial", "Pasture", "PermanentCrop", "Residential", "River", "SeaLake"]
 
114
 
115
- image = image_loader('./asset/Residential_167.jpg')
116
- image_tensor = get_preprocess(is_ms=False, all_bands=False)(image).unsqueeze(0).to('cuda')
117
 
118
  logits, pred_idx = zero_shot_classify(model, image_tensor, categories)
119
- print(f"Predicted: {categories[pred_idx.item()]}") # Output: Residential
120
- ```
121
-
122
- <details>
123
- <summary><b>Expected Output</b></summary>
124
 
125
- ```
126
- Image: ./asset/Residential_167.jpg
127
- Predicted: Residential
128
- Confidence scores:
129
- AnnualCrop: -0.0075
130
- Forest: -0.0633
131
- HerbaceousVegetation: -0.0219
132
- Highway: 0.0283
133
- Industrial: 0.0887
134
- Pasture: 0.0178
135
- PermanentCrop: -0.0197
136
- Residential: 0.0908
137
- River: -0.0487
138
- SeaLake: -0.0441
139
  ```
140
 
141
- </details>
142
 
143
  ---
144
-
145
  ## Citation
146
-
147
  ```bibtex
148
  @misc{do2026sattxt,
149
  title={Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery},
@@ -155,7 +117,6 @@ Confidence scores:
155
  url={https://arxiv.org/abs/2602.22613},
156
  }
157
  ```
158
-
159
  ---
160
 
161
  ## Acknowledgements
@@ -165,9 +126,9 @@ We pretrained the model with:
165
  We use evaluation scripts from:
166
  [MS-CLIP](https://github.com/IBM/MS-CLIP) and [Pangaea-Bench](https://github.com/VMarsocci/pangaea-bench)
167
 
 
 
168
  ---
169
  <p>
170
  We welcome contributions and issues to further improve SATtxt.
171
- </p>
172
-
173
-
 
13
  - satellite-imagery
14
  - remote-sensing
15
  ---
 
 
16
  # SATtxt - Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery
17
+
18
  <p align="center">
19
  <img src="https://i.imgur.com/waxVImv.png" alt="SATtxt">
20
  </p>
 
32
  <a href="https://github.com/ikhado/sattxt"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
33
  </p>
34
 
 
 
 
 
 
 
 
 
 
 
35
  ---
36
 
37
  ## Overview
 
38
  SATtxt is a vision-language foundation model for satellite imagery. We train **only the projection heads**, keeping both encoders frozen.
 
39
  <table>
40
  <tr><th>Component</th><th>Backbone</th><th>Parameters</th></tr>
41
  <tr><td>Vision Encoder</td><td><a href="https://github.com/facebookresearch/dinov3">DINOv3</a> ViT-L/16</td><td>Frozen</td></tr>
 
43
  <tr><td>Vision Head</td><td>Transformer Projection</td><td>Trained</td></tr>
44
  <tr><td>Text Head</td><td>Linear Projection</td><td>Trained</td></tr>
45
  </table>
 
46
  ---
 
47
  ## Installation
48
 
49
  ```bash
50
+ git clone https://github.com/ikhado/sattxt.git
51
  cd sattxt
52
  pip install -r requirements.txt
53
  pip install flash-attn --no-build-isolation # Required for LLM2Vec
54
  ```
 
55
  ---
 
56
  ## Model Weights
 
57
  Download the required weights:
 
58
  | Component | Source |
59
  |-----------|--------|
60
  | DINOv3 ViT-L/16 | [facebookresearch/dinov3](https://github.com/facebookresearch/dinov3) → `dinov3_vitl16_pretrain_sat493m.pth` |
 
63
  | Text Head | [sattxt_text_head.pt](https://huggingface.co/ikhado/sattxt/blob/main/sattxt_text_head.pt) |
64
 
65
  Clone DINOv3 into the `thirdparty` folder:
 
66
  ```bash
67
  cd thirdparty && git clone https://github.com/facebookresearch/dinov3.git
68
  ```
69
 
70
  ---
 
71
  ## Quick Start
72
 
73
  ```python
74
  import sys
75
  from pathlib import Path
76
 
77
+ import torch
78
+
79
  sys.path.insert(0, str(Path(__file__).resolve().parent / "thirdparty" / "dinov3"))
80
 
81
  from sattxt.model import SATtxt
82
  from sattxt.utils import image_loader, get_preprocess, zero_shot_classify
83
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
84
 
 
85
  model = SATtxt(
86
+ dinov3_weights_path="/PATH/TO/dinov3_vitl16_pretrain_sat493m-eadcf0ff.pth",
87
+ sattxt_vision_head_pretrain_weights="/PATH/TO/sattxt_vision_head.pt",
88
+ text_encoder_id="McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp",
89
+ sattxt_text_head_pretrain_weights="/PATH/TO/sattxt_text_head.pt",
90
+ ).to(device).eval()
91
 
92
+ categories = [
93
+ "AnnualCrop", "Forest", "HerbaceousVegetation", "Highway", "Industrial",
94
+ "Pasture", "PermanentCrop", "Residential", "River", "SeaLake"
95
+ ]
96
 
97
+ image = image_loader("./asset/Residential_167.jpg")
98
+ image_tensor = get_preprocess(is_ms=False, all_bands=False)(image).unsqueeze(0).to(device)
99
 
100
  logits, pred_idx = zero_shot_classify(model, image_tensor, categories)
 
 
 
 
 
101
 
102
+ print("Prediction:", categories[pred_idx.item()])
 
 
 
 
 
 
 
 
 
 
 
 
 
103
  ```
104
 
105
+ Please check [demo.py](./demo.py) for more details.
106
 
107
  ---
 
108
  ## Citation
 
109
  ```bibtex
110
  @misc{do2026sattxt,
111
  title={Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery},
 
117
  url={https://arxiv.org/abs/2602.22613},
118
  }
119
  ```
 
120
  ---
121
 
122
  ## Acknowledgements
 
126
  We use evaluation scripts from:
127
  [MS-CLIP](https://github.com/IBM/MS-CLIP) and [Pangaea-Bench](https://github.com/VMarsocci/pangaea-bench)
128
 
129
+ We also use LLMs (such as ChatGPT and Claude) for code refactoring.
130
+
131
  ---
132
  <p>
133
  We welcome contributions and issues to further improve SATtxt.
134
+ </p>