Image-Text-to-Text
Transformers
Safetensors
lfm2_vl
satellite
geospatial
vision-language
lfm
liquid-ai
earth-observation
multi-image
conversational
Instructions to use NuTonic/lspace with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NuTonic/lspace with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="NuTonic/lspace") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("NuTonic/lspace") model = AutoModelForImageTextToText.from_pretrained("NuTonic/lspace") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use NuTonic/lspace with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NuTonic/lspace" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NuTonic/lspace", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/NuTonic/lspace
- SGLang
How to use NuTonic/lspace with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "NuTonic/lspace" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NuTonic/lspace", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "NuTonic/lspace" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NuTonic/lspace", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use NuTonic/lspace with Docker Model Runner:
docker model run hf.co/NuTonic/lspace
File size: 6,689 Bytes
4c3374d 3ec756b 4c3374d 3ec756b 4c3374d 3ec756b 4c3374d 3ec756b 4c3374d 3ec756b 4c3374d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | ---
license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text
tags:
- satellite
- geospatial
- vision-language
- lfm
- liquid-ai
- earth-observation
- multi-image
base_model: LiquidAI/LFM2.5-VL-450M
---
# NuTonic/lspace
**NU:TONIC satellite VLM** — supervised fine-tuned (SFT) checkpoint derived from **[LiquidAI/LFM2.5-VL-450M](https://huggingface.co/LiquidAI/LFM2.5-VL-450M)** on a **single LEAP `vlm_sft` run** over one mixed Parquet corpus (main + repeated task hubs + repeated Firewatch).
- **Model page:** https://huggingface.co/NuTonic/lspace
- **Training recipe https://github.com/josephrp/nutonic :** NU:TONIC — `train/run_sat_vl_sft_e2e.py` orchestrates `train/materialize_vlm_sft_mix.py` → LEAP `vlm_sft` via `train/train_lfm_vl_sft.py` and `refs/leap-finetune-main`.
## Intended use
Use this model when you want a **small (~0.45B) image–text model** that has seen **many supervised examples** of:
- **Satellite RGB chips** (Sentinel-2–style optical previews / tiled chips used in NU:TONIC datasets),
- Optional **overhead / map-style context stills** (`mapbox_stills/` in the training corpora),
- Optional **analysis-condition visuals** (profile-conditioned render PNGs present in some training rows),
- **Multi-image user turns** (temporal pairs and terramind predictions),
- Assistant outputs that mix **narrative geospatial reasoning** with **structured artifacts seen in training**, including **normalized bounding boxes** and **JSON-like detection lists** when prompted.
Typical applications:
- **Satellite image captioning** and coarse **land-cover / structure** description (non-exhaustive).
- **Scenario-aligned narratives** consistent with NU:TONIC “PRO mini-app” training slices:
- wildfire / burn scar style reasoning (**Firewatch-SFT** slice),
- coastal / bright-target / maritime-style reasoning (**OceanScout-SFT** slice),
- land-cover transition reasoning (**LandShift-SFT** slice),
- inundation / water-expansion reasoning (**FloodPulse-SFT** slice),
- **structured analytical brief** writing (**BriefComposer-SFT** slice).
This checkpoint is **not** a full analytic pipeline: it does **not** fetch imagery from STAC, run Earth Engine, or guarantee calibration to real-world hazard operations without human review.
## Training data (what it actually saw)
Training is **main-heavy** by construction: the mix streams almost all rows from the aggregate Hub dataset, then **upsamples** smaller hubs so rare behaviors still receive gradient mass after global shuffling.
### Main corpus (dominant mass)
- **`NuTonic/sat-vl-sft-training-ready-v1`**
Aggregate **training-ready Parquet** packaging NU:TONIC satellite VLM supervision derived from multiple builders, including (non-exhaustively) metadata-first procedural rows and bounding-box-heavy corpora. Rows commonly include **`messages`** with multi-part `user.content` mixing **`image`** + **`text`**, and assistant targets describing imagery, evidence, and/or structured outputs consistent with NU:TONIC JSONL/VLM conventions.
### Upsampled task hubs (default repeat = 8× each)
These teach **multi-image / vertical-specific** behaviors described in internal NU:TONIC dataset planning (PRO mini-apps alignment):
- **`NuTonic/brief-composer-sft-v1`** — mixed multi-image prompts toward **structured analytical brief** writing.
- **`NuTonic/oceanscout-sft-v1`** — maritime / water-context bbox + narrative patterns.
- **`NuTonic/floodpulse-sft-v1`** — temporal pair reasoning around inundation extent patterns.
- **`NuTonic/landshift-sft-v1`** — temporal pair reasoning around land-cover transition patterns.
### Upsampled small hub (default repeat = 80×)
- **`NuTonic/firewatch-sft-v1`** — wildfire / burn scar oriented supervision (small row count; repeated for mass).
### Important implication
Because SFT matches **teacher strings**, the model may:
- Echo **dataset-specific prompt framing** (profile cues, task wording),
- Prefer **bbox conventions seen in training** (typically **0–1 normalized** box coordinates embedded in assistant text / JSON-like structures; see NU:TONIC notes aligned with LEAP `vlm_sft` conventions),
- Reflect **English** supervision dominate if that is true in the upstream datasets.
## Non-goals / limitations
- **No warranty of geophysical correctness**: outputs are learned correlations from curated supervision; validate operationally for your AOI, sensor, season, and labeling definition.
- **Distribution shift**: performance drops are expected off-domain (different sensors, resolutions, projections, stylizations, heavy cloud cover, night imagery, SAR, etc.).
- **Privacy / safety**: training mixes may include overhead context stills in some rows; do not use outputs as sole evidence for high-risk decisions (disasters, enforcement, insurance) without independent verification.
- **Grounding reliability**: bbox/JSON outputs should be treated as **model proposals**, not GIS truth.
## Inference quickstart (Transformers)
This family loads like other HF multimodal chat models (requires **`trust_remote_code=True`** for Liquid remote modules).
Minimal pattern (single image) — (`AutoModelForImageTextToText` + `AutoProcessor`):
```python
import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor
model_id = "NuTonic/lspace"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
pil = Image.open("chip.png").convert("RGB")
user_text = (
"The input is satellite imagery (RGB). Describe surface cover and structure where visible, "
"and note uncertainty."
)
conversation = [
{
"role": "user",
"content": [
{"type": "image", "image": pil},
{"type": "text", "text": user_text},
],
}
]
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
tokenize=True,
).to(model.device)
with torch.inference_mode():
out = model.generate(**inputs, max_new_tokens=512, do_sample=False)
# Trim prompt tokens (exact slicing depends on model wrapper); simplest decode:
text = processor.batch_decode(out, skip_special_tokens=True)[0]
print(text)
# NuTonic/lspace
Fine-tuned from `LiquidAI/LFM2.5-VL-450M` using the NU:TONIC satellite VLM SFT mix
(`train/run_sat_vl_sft_e2e.py`): single LEAP run on main + task + Firewatch Parquet mix.
Training stack: LEAP `vlm_sft` in this repo's `refs/leap-finetune-main`.
|