File size: 6,689 Bytes
4c3374d
3ec756b
4c3374d
3ec756b
4c3374d
 
 
3ec756b
4c3374d
3ec756b
 
 
4c3374d
 
 
 
 
3ec756b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c3374d
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text
tags:
  - satellite
  - geospatial
  - vision-language
  - lfm
  - liquid-ai
  - earth-observation
  - multi-image
base_model: LiquidAI/LFM2.5-VL-450M
---

# NuTonic/lspace

**NU:TONIC satellite VLM** — supervised fine-tuned (SFT) checkpoint derived from **[LiquidAI/LFM2.5-VL-450M](https://huggingface.co/LiquidAI/LFM2.5-VL-450M)** on a **single LEAP `vlm_sft` run** over one mixed Parquet corpus (main + repeated task hubs + repeated Firewatch).

- **Model page:** https://huggingface.co/NuTonic/lspace  
- **Training recipe https://github.com/josephrp/nutonic :** NU:TONIC — `train/run_sat_vl_sft_e2e.py` orchestrates `train/materialize_vlm_sft_mix.py` → LEAP `vlm_sft` via `train/train_lfm_vl_sft.py` and `refs/leap-finetune-main`.

## Intended use

Use this model when you want a **small (~0.45B) image–text model** that has seen **many supervised examples** of:

- **Satellite RGB chips** (Sentinel-2–style optical previews / tiled chips used in NU:TONIC datasets),
- Optional **overhead / map-style context stills** (`mapbox_stills/` in the training corpora),
- Optional **analysis-condition visuals** (profile-conditioned render PNGs present in some training rows),
- **Multi-image user turns** (temporal pairs and terramind predictions),
- Assistant outputs that mix **narrative geospatial reasoning** with **structured artifacts seen in training**, including **normalized bounding boxes** and **JSON-like detection lists** when prompted.

Typical applications:

- **Satellite image captioning** and coarse **land-cover / structure** description (non-exhaustive).
- **Scenario-aligned narratives** consistent with NU:TONIC “PRO mini-app” training slices:
  - wildfire / burn scar style reasoning (**Firewatch-SFT** slice),
  - coastal / bright-target / maritime-style reasoning (**OceanScout-SFT** slice),
  - land-cover transition reasoning (**LandShift-SFT** slice),
  - inundation / water-expansion reasoning (**FloodPulse-SFT** slice),
  - **structured analytical brief** writing (**BriefComposer-SFT** slice).

This checkpoint is **not** a full analytic pipeline: it does **not** fetch imagery from STAC, run Earth Engine, or guarantee calibration to real-world hazard operations without human review.

## Training data (what it actually saw)

Training is **main-heavy** by construction: the mix streams almost all rows from the aggregate Hub dataset, then **upsamples** smaller hubs so rare behaviors still receive gradient mass after global shuffling.

### Main corpus (dominant mass)

- **`NuTonic/sat-vl-sft-training-ready-v1`**  
  Aggregate **training-ready Parquet** packaging NU:TONIC satellite VLM supervision derived from multiple builders, including (non-exhaustively) metadata-first procedural rows and bounding-box-heavy corpora. Rows commonly include **`messages`** with multi-part `user.content` mixing **`image`** + **`text`**, and assistant targets describing imagery, evidence, and/or structured outputs consistent with NU:TONIC JSONL/VLM conventions.

### Upsampled task hubs (default repeat = 8× each)

These teach **multi-image / vertical-specific** behaviors described in internal NU:TONIC dataset planning (PRO mini-apps alignment):

- **`NuTonic/brief-composer-sft-v1`** — mixed multi-image prompts toward **structured analytical brief** writing.
- **`NuTonic/oceanscout-sft-v1`** — maritime / water-context bbox + narrative patterns.
- **`NuTonic/floodpulse-sft-v1`** — temporal pair reasoning around inundation extent patterns.
- **`NuTonic/landshift-sft-v1`** — temporal pair reasoning around land-cover transition patterns.

### Upsampled small hub (default repeat = 80×)

- **`NuTonic/firewatch-sft-v1`** — wildfire / burn scar oriented supervision (small row count; repeated for mass).

### Important implication

Because SFT matches **teacher strings**, the model may:

- Echo **dataset-specific prompt framing** (profile cues, task wording),
- Prefer **bbox conventions seen in training** (typically **0–1 normalized** box coordinates embedded in assistant text / JSON-like structures; see NU:TONIC notes aligned with LEAP `vlm_sft` conventions),
- Reflect **English** supervision dominate if that is true in the upstream datasets.

## Non-goals / limitations

- **No warranty of geophysical correctness**: outputs are learned correlations from curated supervision; validate operationally for your AOI, sensor, season, and labeling definition.
- **Distribution shift**: performance drops are expected off-domain (different sensors, resolutions, projections, stylizations, heavy cloud cover, night imagery, SAR, etc.).
- **Privacy / safety**: training mixes may include overhead context stills in some rows; do not use outputs as sole evidence for high-risk decisions (disasters, enforcement, insurance) without independent verification.
- **Grounding reliability**: bbox/JSON outputs should be treated as **model proposals**, not GIS truth.

## Inference quickstart (Transformers)

This family loads like other HF multimodal chat models (requires **`trust_remote_code=True`** for Liquid remote modules).

Minimal pattern (single image) — (`AutoModelForImageTextToText` + `AutoProcessor`):

```python
import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor

model_id = "NuTonic/lspace"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

pil = Image.open("chip.png").convert("RGB")
user_text = (
    "The input is satellite imagery (RGB). Describe surface cover and structure where visible, "
    "and note uncertainty."
)

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": pil},
            {"type": "text", "text": user_text},
        ],
    }
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=512, do_sample=False)

# Trim prompt tokens (exact slicing depends on model wrapper); simplest decode:
text = processor.batch_decode(out, skip_special_tokens=True)[0]
print(text)

# NuTonic/lspace

Fine-tuned from `LiquidAI/LFM2.5-VL-450M` using the NU:TONIC satellite VLM SFT mix
(`train/run_sat_vl_sft_e2e.py`): single LEAP run on main + task + Firewatch Parquet mix.

Training stack: LEAP `vlm_sft` in this repo's `refs/leap-finetune-main`.