ikhado
/

sattxt

@@ -13,9 +13,8 @@ tags:
 - satellite-imagery
 - remote-sensing
 ---
 # SATtxt - Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery
 <p align="center">
     <img src="https://i.imgur.com/waxVImv.png" alt="SATtxt">
 </p>
@@ -33,22 +32,10 @@ tags:
   <a href="https://github.com/ikhado/sattxt"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
 </p>
----
-## 📰 News
-| Date | Update |
-|------|--------|
-| **Mar 9, 2026** | We have released model code and weights. |
-| **Feb 23, 2026** | SATtxt is accepted at **CVPR 2026**. We appreciate the reviewers and ACs. |
 ---
 ## Overview
 SATtxt is a vision-language foundation model for satellite imagery. We train **only the projection heads**, keeping both encoders frozen.
 <table>
 <tr><th>Component</th><th>Backbone</th><th>Parameters</th></tr>
 <tr><td>Vision Encoder</td><td><a href="https://github.com/facebookresearch/dinov3">DINOv3</a> ViT-L/16</td><td>Frozen</td></tr>
@@ -56,24 +43,18 @@ SATtxt is a vision-language foundation model for satellite imagery. We train **o
 <tr><td>Vision Head</td><td>Transformer Projection</td><td>Trained</td></tr>
 <tr><td>Text Head</td><td>Linear Projection</td><td>Trained</td></tr>
 </table>
 ---
 ## Installation
 ```bash
-git clone https://github.com/your-repo/sattxt.git
 cd sattxt
 pip install -r requirements.txt
 pip install flash-attn --no-build-isolation  # Required for LLM2Vec
 ```
 ---
 ## Model Weights
 Download the required weights:
 | Component | Source |
 |-----------|--------|
 | DINOv3 ViT-L/16 | [facebookresearch/dinov3](https://github.com/facebookresearch/dinov3) → `dinov3_vitl16_pretrain_sat493m.pth` |
@@ -82,68 +63,49 @@ Download the required weights:
 | Text Head | [sattxt_text_head.pt](https://huggingface.co/ikhado/sattxt/blob/main/sattxt_text_head.pt) |
 Clone DINOv3 into the `thirdparty` folder:
 ```bash
 cd thirdparty && git clone https://github.com/facebookresearch/dinov3.git
 ```
 ---
 ## Quick Start
 ```python
 import sys
 from pathlib import Path
 sys.path.insert(0, str(Path(__file__).resolve().parent / "thirdparty" / "dinov3"))
 from sattxt.model import SATtxt
 from sattxt.utils import image_loader, get_preprocess, zero_shot_classify
-# Load model
 model = SATtxt(
-    dinov3_weights_path='PATH/TO/dinov3_vitl16_pretrain_sat493m.pth',
-    sattxt_vision_head_pretrain_weights='PATH/TO/sattxt_vision_head.pt',
-    text_encoder_id='McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp',
-    sattxt_text_head_pretrain_weights='PATH/TO/sattxt_text_head.pt'
-).to('cuda').eval()
-# Zero-shot classification
-categories = ["AnnualCrop", "Forest", "HerbaceousVegetation", "Highway",
-              "Industrial", "Pasture", "PermanentCrop", "Residential", "River", "SeaLake"]
-image = image_loader('./asset/Residential_167.jpg')
-image_tensor = get_preprocess(is_ms=False, all_bands=False)(image).unsqueeze(0).to('cuda')
 logits, pred_idx = zero_shot_classify(model, image_tensor, categories)
-print(f"Predicted: {categories[pred_idx.item()]}")  # Output: Residential
-```
-<details>
-<summary><b>Expected Output</b></summary>
-```
-Image: ./asset/Residential_167.jpg
-Predicted: Residential
-Confidence scores:
-  AnnualCrop: -0.0075
-  Forest: -0.0633
-  HerbaceousVegetation: -0.0219
-  Highway: 0.0283
-  Industrial: 0.0887
-  Pasture: 0.0178
-  PermanentCrop: -0.0197
-  Residential: 0.0908
-  River: -0.0487
-  SeaLake: -0.0441
 ```
-</details>
 ---
 ## Citation
 ```bibtex
 @misc{do2026sattxt,
       title={Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery},
@@ -155,7 +117,6 @@ Confidence scores:
       url={https://arxiv.org/abs/2602.22613},
 }
 ```
 ---
 ## Acknowledgements
@@ -165,9 +126,9 @@ We pretrained the model with:
 We use evaluation scripts from:
 [MS-CLIP](https://github.com/IBM/MS-CLIP) and [Pangaea-Bench](https://github.com/VMarsocci/pangaea-bench)
 ---
 <p>
   We welcome contributions and issues to further improve SATtxt.
-</p>

 - satellite-imagery
 - remote-sensing
 ---
 # SATtxt - Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery
 <p align="center">
     <img src="https://i.imgur.com/waxVImv.png" alt="SATtxt">
 </p>
   <a href="https://github.com/ikhado/sattxt"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
 </p>
 ---
 ## Overview
 SATtxt is a vision-language foundation model for satellite imagery. We train **only the projection heads**, keeping both encoders frozen.
 <table>
 <tr><th>Component</th><th>Backbone</th><th>Parameters</th></tr>
 <tr><td>Vision Encoder</td><td><a href="https://github.com/facebookresearch/dinov3">DINOv3</a> ViT-L/16</td><td>Frozen</td></tr>
 <tr><td>Vision Head</td><td>Transformer Projection</td><td>Trained</td></tr>
 <tr><td>Text Head</td><td>Linear Projection</td><td>Trained</td></tr>
 </table>
 ---
 ## Installation
 ```bash
+git clone https://github.com/ikhado/sattxt.git
 cd sattxt
 pip install -r requirements.txt
 pip install flash-attn --no-build-isolation  # Required for LLM2Vec
 ```
 ---
 ## Model Weights
 Download the required weights:
 | Component | Source |
 |-----------|--------|
 | DINOv3 ViT-L/16 | [facebookresearch/dinov3](https://github.com/facebookresearch/dinov3) → `dinov3_vitl16_pretrain_sat493m.pth` |
 | Text Head | [sattxt_text_head.pt](https://huggingface.co/ikhado/sattxt/blob/main/sattxt_text_head.pt) |
 Clone DINOv3 into the `thirdparty` folder:
 ```bash
 cd thirdparty && git clone https://github.com/facebookresearch/dinov3.git
 ```
 ---
 ## Quick Start
 ```python
 import sys
 from pathlib import Path
+import torch
 sys.path.insert(0, str(Path(__file__).resolve().parent / "thirdparty" / "dinov3"))
 from sattxt.model import SATtxt
 from sattxt.utils import image_loader, get_preprocess, zero_shot_classify
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
 model = SATtxt(
+    dinov3_weights_path="/PATH/TO/dinov3_vitl16_pretrain_sat493m-eadcf0ff.pth",
+    sattxt_vision_head_pretrain_weights="/PATH/TO/sattxt_vision_head.pt",
+    text_encoder_id="McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp",
+    sattxt_text_head_pretrain_weights="/PATH/TO/sattxt_text_head.pt",
+).to(device).eval()
+categories = [
+    "AnnualCrop", "Forest", "HerbaceousVegetation", "Highway", "Industrial",
+    "Pasture", "PermanentCrop", "Residential", "River", "SeaLake"
+]
+image = image_loader("./asset/Residential_167.jpg")
+image_tensor = get_preprocess(is_ms=False, all_bands=False)(image).unsqueeze(0).to(device)
 logits, pred_idx = zero_shot_classify(model, image_tensor, categories)
+print("Prediction:", categories[pred_idx.item()])
 ```
+Please check [demo.py](./demo.py) for more details.
 ---
 ## Citation
 ```bibtex
 @misc{do2026sattxt,
       title={Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery},
       url={https://arxiv.org/abs/2602.22613},
 }
 ```
 ---
 ## Acknowledgements
 We use evaluation scripts from:
 [MS-CLIP](https://github.com/IBM/MS-CLIP) and [Pangaea-Bench](https://github.com/VMarsocci/pangaea-bench)
+We also use LLMs (such as ChatGPT and Claude) for code refactoring.
 ---
 <p>
   We welcome contributions and issues to further improve SATtxt.
+</p>