galamsey-v9-e3
Fine-tune of LiquidAI/LFM2.5-VL-450M for detecting illegal small-scale gold mining ("galamsey") in Sentinel-2 satellite imagery over Ghana. Used as the perception layer of GalamseyWatch, a two-layer agentic Earth-observation system (perception VLM + LFM2 tool-calling policy) submitted to the Liquid AI × DPhi Space "AI in Space" hackathon.
The browser/WebGPU sibling of this checkpoint is samwell/galamsey-v9-e3-onnx.
Live demo
A click-to-detect dashboard running this model fully in-browser via WebGPU: galamseywatch.vercel.app. Click anywhere over Ghana, the page pulls a Sentinel-2 tile, runs the model on-device, and renders bounding boxes plus a description. ~1 GB one-time download, then cached. Nothing leaves the device.
What it does
Given paired RGB and SWIR false-color composites of a 1.28 km Sentinel-2 tile (10 m/px), the model returns:
- A JSON list of bounding boxes for every visible mining pit, normalized to
[0, 1]. - A natural-language description of the scene (e.g. "Multiple active excavation pits with sediment plumes and exposed lateritic soil").
Both outputs come from the same fine-tune, with two prompts. The grounding prompt emits boxes; the description prompt emits prose. See the GalamseyWatch repo for the exact prompts and the post-processing (NMS, min-area filter).
How it composes with the LFM2 agent
In GalamseyWatch's on-orbit pipeline, this VLM is the perception layer. Its structured output (bounding boxes, derived confidence, scene description) is handed to an LFM2-2.6B tool-calling policy that decides, per tile, what to do under a bandwidth budget:
downlink_now: high-confidence detection worth the bandwidth.flag_for_review: moderate confidence, log a text-only entry (cheap).discard: no signal, forest, water, or cloud-obscured.request_neighbor_tile: feature continues off-frame.request_higher_resolution: small candidate needs more pixels.
This split lets each model work on inputs sized to its parameter budget: the 450M VLM on pixels, the 2.6B LLM on text plus per-pass scalar context (bandwidth remaining, cloud cover, captured-at, neighbor tiles). The compression boundary is exactly where the VLM hands off bounding boxes and prose to the agent, no shared image input.
See orchestrator/agentic_eo/models/agent.py for the agent setup, system prompt, and tool definitions.
Performance
Evaluated on the SmallMinesDS test split, full pixel-IoU pipeline (bbox-to-mask scoring), RGB + SWIR two-image prompt.
Lift over base model:
| Metric | Base LFM2.5-VL-450M | galamsey-v9-e3 | Δ |
|---|---|---|---|
| Pixel IoU | 0.069 | 0.332 | +0.263 (~4.8×) |
Full evaluation, galamsey-v9-e3:
| Metric | Value |
|---|---|
| Pixel IoU | 0.332 |
| Pixel recall | 0.592 |
| Pixel SDC F1 | 0.499 |
| Patch accuracy | 0.795 |
Recall, F1, and patch accuracy were not separately recorded for the base run; the IoU lift is the headline number.
Honest ceiling
Galamsey pits are irregular polygons, but this model emits axis-aligned bounding boxes. Even with perfect bbox predictions (every box exactly circumscribing its ground-truth mask), the maximum achievable pixel IoU against the SmallMinesDS pixel-level masks is 0.4692, computed directly by converting every GT mask to its tightest bbox and scoring that against the mask.
So 0.332 = 71% of the achievable ceiling for any bbox-emitting method on this benchmark. A pixel-mask architecture (e.g. a U-Net) operates in a different regime and is not directly comparable.
Training
| Base model | LiquidAI/LFM2.5-VL-450M |
| Dataset | SmallMinesDS (Ofori-Ampofo et al., 2025): 4,270 labeled Ghana patches, CC-BY-SA-4.0 |
| Inputs | Paired RGB and SWIR false-color composites (two-image prompt) |
| Augmentation | 4× D4 dihedral group (flips + rotations) |
| Method | Full fine-tuning (no LoRA) |
| Epochs | 3 (17,719 steps) |
| Batch size | 4 |
| Learning rate | 2e-5, with separate rates for LM / projector / vision tower |
| Hardware | 1× NVIDIA H100 via Modal |
| Final training loss | 0.175 (from 2.10 at step 1) |
Intended use
- Detection of illegal gold-mining pits in 10 m/px Sentinel-2 imagery over Ghana.
- Grounded inspection workflows where a human reviewer wants both bounding boxes and a natural-language second opinion.
- Edge / on-orbit deployment: the 450M parameter count and ONNX export make this practical for satellite-class compute.
It is not a general-purpose mining detector. Performance outside Ghana, outside Sentinel-2, or outside the visible/SWIR composites it was trained on is not guaranteed.
Known failure modes
Surfaced honestly so downstream users can plan around them:
- Cloud-occluded tiles. Sentinel-2 is optical, not SAR; the model can't see through cloud and may hallucinate mining where it sees only whiteness. Pre-filter on
cloud_coverif possible. - Legal quarries. There is no visual signal in a single patch that distinguishes licensed quarrying from galamsey; cross-reference with concession polygons at the post-inference layer.
- Freshly-cleared farmland. Similar SWIR signature to exposed soil. The geometric shape (rectilinear vs. amorphous) is the disambiguating cue, not the spectrum.
- Tiny pits (2–3 pixels). Bbox effectively a point; pixel IoU is noisy at this regime.
- Out-of-distribution geology. Eastern / Volta regions outside the SmallMinesDS training geography.
Inference
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
model = AutoModelForImageTextToText.from_pretrained(
"samwell/galamsey-v9-e3", device_map="auto", dtype="bfloat16"
)
processor = AutoProcessor.from_pretrained("samwell/galamsey-v9-e3")
rgb = Image.open("tile_rgb.png") # B4+B3+B2 composite
swir = Image.open("tile_swir.png") # B12+B11+B8 false-color composite
GROUNDING_PROMPT = (
"You are viewing two images of the same Sentinel-2 patch: a natural-color RGB "
"composite and a SWIR false-color composite. Using both views, detect any "
"illegal small-scale gold mining pits. Include any exposed soil, excavation, "
"or sediment-laden water even if you are uncertain, err toward detection. "
'Provide result as a valid JSON: [{"label": str, "bbox": [x1,y1,x2,y2]}, ...]. '
"Coordinates must be normalized to 0-1. Only return [] if the scene is entirely "
"pristine forest, clean water, or urban built-up area with no disturbance."
)
conversation = [{
"role": "user",
"content": [
{"type": "image", "image": rgb},
{"type": "image", "image": swir},
{"type": "text", "text": GROUNDING_PROMPT},
],
}]
inputs = processor.apply_chat_template(
conversation, add_generation_prompt=True, return_tensors="pt",
return_dict=True, tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(processor.batch_decode(outputs, skip_special_tokens=True)[0])
For the description prompt and the full inference pipeline (NMS, min-bbox-area filter, area estimation), see app/src/lib/inference.ts (browser path) and orchestrator/agentic_eo/models/vlm.py (Python path).
Citation
If you use this model, please cite:
- The base model: Liquid AI's LFM2 Technical Report (arXiv:2511.23404).
- The dataset: Ofori-Ampofo et al., 2025, SmallMinesDS, IEEE GRSL.
- This fine-tune:
samwell/galamsey-v9-e3and the GalamseyWatch repo.
License
LFM Open License v1.0, inherited from the base model.
- Downloads last month
- 60
Model tree for samwell/galamsey-v9-e3
Base model
LiquidAI/LFM2.5-350M-Base