Instructions to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-VL-1.6B") model = PeftModel.from_pretrained(base_model, "Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b") - llama-cpp-python
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b", filename="orion-mmproj-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16 # Run inference directly in the terminal: llama-cli -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16 # Run inference directly in the terminal: llama-cli -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16 # Run inference directly in the terminal: ./llama-cli -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
Use Docker
docker model run hf.co/Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
- LM Studio
- Jan
- vLLM
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
- Ollama
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with Ollama:
ollama run hf.co/Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
- Unsloth Studio
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b to start chatting
- Pi
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
Run Hermes
hermes
- Docker Model Runner
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with Docker Model Runner:
docker model run hf.co/Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
- Lemonade
How to use Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b:F16
Run and chat with the model
lemonade run user.orion-qlora-lfm2.5-vl-1.6b-F16
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)ORION: Orbital Triage LoRA Adapter
QLoRA fine-tune of LiquidAI/LFM2.5-VL-1.6B for autonomous satellite image triage. Classifies 512ร512 RGB frames captured at LEO as HIGH (strategic anomaly, downlink immediately), MEDIUM (human infrastructure, store for bulk transfer), or LOW (featureless terrain, discard).
Developed for ORION, an autonomous LEO satellite triage system running on a Raspberry Pi 5 via NASA F-Prime. The Q4_K_M GGUF quantization of this adapter is deployed on-board and runs inference at 51-82 s/frame (mean ~69s across 1,443 frames from 3 end-to-end runs) entirely on CPU.
Uses
Intended use: on-board orbital triage on a satellite OBC. The model receives a 512ร512 RGB satellite tile (optionally with GPS coordinates in the prompt) and returns a JSON object with a triage verdict and visual reasoning.
Triage prompt (ChatML format, used identically for training, evaluation, and on-board inference):
<|im_start|>user
<image>
You are an autonomous orbital triage assistant. Analyze this
high-resolution RGB satellite image captured at Longitude: {lon},
Latitude: {lat}.
Strictly use one of these categories based on visual morphology:
- HIGH: Extreme-scale strategic anomalies, dense geometric cargo/vessel
infrastructure, massive cooling towers, sprawling runways, or distinct
geological/artificial chokepoints.
- MEDIUM: Standard human civilization. Ordinary urban grids, low-density
suburban sprawl, regular checkerboard agriculture, or localized
infrastructure.
- LOW: Complete absence of human infrastructure. Featureless deep oceans,
unbroken canopy, barren deserts, or purely natural geological formations.
You MUST output your response as a valid JSON object. To ensure accurate
visual reasoning, you must output the "reason" key FIRST, followed by
the "category" key.<|im_end|>
<|im_start|>assistant
The model responds with {"reason": "...", "category": "HIGH|MEDIUM|LOW"}. Reason-first ordering forces the model to commit to visual evidence before selecting a label. During training, half the samples omit the Longitude/Latitude line (coordinate dropout augmentation).
Out of scope: multispectral analysis, change detection, object detection with bounding boxes, real-time video, or any use case requiring sub-60-second latency without CUDA acceleration.
Dataset
The adapter was trained on the ORION dataset, 360 curated target locations organized by triage priority and visual morphology, fetched as 512ร512 RGB tiles from SimSat's Mapbox API.
| Class | Targets | Visual morphology |
|---|---|---|
| LOW | 120 | Oceans, deserts, ice sheets, dense canopy, geological formations |
| MEDIUM | 120 | Urban grids, suburban sprawl, agriculture, regional infrastructure |
| HIGH | 120 | Mega-ports, mega-airports, energy/dams, mega-mines, military/space facilities |
Hard negatives are included in LOW: coastlines and geological formations that mimic artificial structures (calderas, salt flat fractals, river deltas).
Split (deterministic, random.seed(42)):
| Split | Records | Notes |
|---|---|---|
| Train | 480 | 240 targets ร 2 (coordinate dropout augmentation) |
| Val | 60 | Always with coordinates; used for eval_loss + best-checkpoint selection |
| Test | 60 | Always with coordinates; held out for ablation and evaluation |
Coordinate dropout augmentation: each training target produces two records, one with GPS coordinates in the prompt and one without. This teaches the model to classify from pixels alone when GPS is unavailable or spoofed.
Training Procedure
Base model
LiquidAI/LFM2.5-VL-1.6B loaded in 4-bit NF4 quantization via bitsandbytes.
LoRA configuration
| Parameter | Value |
|---|---|
Rank (r) |
16 |
| Alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Dropout | 0.05 |
| Bias | none |
| Task type | CAUSAL_LM |
Training arguments
| Parameter | Value |
|---|---|
| Learning rate | 2e-4 |
| Epochs | 3 |
| Per-device batch size | 1 |
| Gradient accumulation steps | 16 (effective batch 16) |
| Optimizer | paged_adamw_8bit |
| Precision | FP16 |
| Gradient checkpointing | enabled |
| Best checkpoint selection | eval_loss (lower is better) |
Hardware
| Component | Spec |
|---|---|
| GPU | NVIDIA GeForce RTX 4070 Ti, 12 GB VRAM |
| CUDA | 12.2 |
| Driver | 535.x |
| OS | Linux |
Training time
| Metric | Value |
|---|---|
| Time per epoch | ~830s |
| Total training time | ~2492s |
Model Artifacts
| Artifact | File | Size | Notes |
|---|---|---|---|
| LoRA adapter (this repo) | orion_lora_weights/ |
~50 MB | r=16, 4 attention projection modules |
| Merged FP16 checkpoint | orion_merged/ |
~3.2 GB | merge_and_unload() output |
| FP16 GGUF | orion-f16.gguf |
~3.2 GB | Intermediate conversion step |
| Q4_K_M GGUF | orion-q4_k_m.gguf |
~730 MB | Deployed to Pi 5 (8 GB RAM) |
| Vision projector | orion-mmproj-f16.gguf |
~814 MB | FP16, deployed alongside Q4 model |
Measured on-device: Total ORION process RSS during inference on the Pi 5 is ~1,753 MB (model + vision encoder + KV cache + F-Prime flight software + buffer pool).
The Q4_K_M GGUF + mmproj pair is the deployed artifact. Pre-built files are available on Hugging Face.
Evaluation
Both studies use the same four conditions run against the same 60-sample held-out test set. The ablation (ablation.py) tests the unmodified base model; the evaluation (evaluate.py) tests the fine-tuned adapter. Running both against identical inputs isolates the exact lift from fine-tuning.
Refer to Training Pipeline for more details on how to read this result.
| Condition | Input | Purpose |
|---|---|---|
| A: Full system | Real image + real GPS coords | Nominal operating condition |
| B: Vision only | Real image + no coords | GPS-denied or noisy environment |
| C: Blind LLM | Gaussian noise image + real coords | Coordinates-only baseline (tests GPS reliance) |
| D: Sensor conflict | Real image + spoofed coords | Adversarial GPS; tests which modality the model trusts |
Ablation study: base model (ablation.py)
| Condition | Overall accuracy | Notes |
|---|---|---|
| A: Vision + GPS coords | 58.3% | |
| B: Vision only (no coords) | 60.0% | Slightly better: coords can mislead base model |
| C: Blind LLM (Gaussian noise + coords) | 35.0% | Predicts LOW for everything; GPS alone is unreliable |
| D: Sensor conflict | N/A | Trusts incorrect coords 20.0% of the time |
Full log:
--- Condition A: Full System (Vision + Coords) ---
HIGH : 8/14 (57.1% Recall) | Precision: 8/17 (47.1%)
MEDIUM: 9/25 (36.0% Recall) | Precision: 9/13 (69.2%)
LOW : 18/21 (85.7% Recall) | Precision: 18/30 (60.0%)
TOTAL : 35/60 (58.3% Overall Accuracy)
--- Condition B: Vision Only (No Coords) ---
HIGH : 9/14 (64.3% Recall) | Precision: 9/16 (56.2%)
MEDIUM: 8/25 (32.0% Recall) | Precision: 8/11 (72.7%)
LOW : 19/21 (90.5% Recall) | Precision: 19/33 (57.6%)
TOTAL : 36/60 (60.0% Overall Accuracy)
--- Condition C: Blind LLM (Gaussian Noise + Coords) ---
HIGH : 0/14 (0.0% Recall) | Precision: 0/0 (0.0%)
MEDIUM: 0/25 (0.0% Recall) | Precision: 0/0 (0.0%)
LOW : 21/21 (100.0% Recall) | Precision: 21/60 (35.0%)
TOTAL : 21/60 (35.0% Overall Accuracy)
--- Condition D: Sensor Conflict (Real Vision + Fake Coords) ---
Model trusted Vision (Correct) : 35/60 (58.3%)
Model trusted Coords (Failure) : 12/60 (20.0%)
Model got Confused (Neither) : 13/60 (21.7%)
Fine-tuned model evaluation (evaluate.py)
| Condition | Overall accuracy | Notes |
|---|---|---|
| A: Vision + GPS coords | 58.3% | |
| B: Vision only (no coords) | 65.0% | Improved over base (+5 pp) |
| C: Blind LLM (Gaussian noise + coords) | 43.3% | Predicts MEDIUM for most noise inputs |
| D: Sensor conflict | - | Trusts incorrect coords 16.7% of the time (down from 20.0%) |
Per-class accuracy (condition A)
| Class | Precision | Recall | F1 |
|---|---|---|---|
| HIGH | 46.7% | 50.0% | 48.3% |
| MEDIUM | 66.7% | 40.0% | 50.0% |
| LOW | 60.0% | 85.7% | 70.6% |
Full log:
--- Condition A: Full System (Vision + Coords) ---
HIGH : 7/14 (50.0% Recall) | Precision: 7/15 (46.7%)
MEDIUM: 10/25 (40.0% Recall) | Precision: 10/15 (66.7%)
LOW : 18/21 (85.7% Recall) | Precision: 18/30 (60.0%)
TOTAL : 35/60 (58.3% Overall Accuracy)
--- Condition B: Vision Only (No Coords) ---
HIGH : 9/14 (64.3% Recall) | Precision: 9/15 (60.0%)
MEDIUM: 12/25 (48.0% Recall) | Precision: 12/17 (70.6%)
LOW : 18/21 (85.7% Recall) | Precision: 18/28 (64.3%)
TOTAL : 39/60 (65.0% Overall Accuracy)
--- Condition C: Blind LLM (Gaussian Noise + Coords) ---
HIGH : 1/14 ( 7.1% Recall) | Precision: 1/ 1 (100.0%)
MEDIUM: 25/25 (100.0% Recall) | Precision: 25/59 (42.4%)
LOW : 0/21 ( 0.0% Recall) | Precision: 0/ 0 (0.0%)
TOTAL : 26/60 (43.3% Overall Accuracy)
--- Condition D: Sensor Conflict (Real Vision + Fake Coords) ---
Model trusted Vision (Correct) : 37/60 (61.7%)
Model trusted Coords (Failure) : 10/60 (16.7%)
Model got Confused (Neither) : 13/60 (21.7%)
Quantized GGUF evaluation (evaluate.py --quantized-model)
The same 4-condition protocol run against the Q4_K_M GGUF deployed on-device via llama.cpp's HTTP server. This measures accuracy degradation from quantization using the exact same test set.
| Condition | Overall accuracy | Notes |
|---|---|---|
| A: Vision + GPS coords | 55.0% | โ3.3 pp from FP16 fine-tuned |
| B: Vision only (no coords) | 63.3% | โ1.7 pp from FP16 fine-tuned |
| C: Blind LLM (Gaussian noise + coords) | 28.3% | Predicts HIGH for most noise inputs |
| D: Sensor conflict | - | Trusts incorrect coords 15.0% of the time (down from 16.7%) |
Full log:
--- Condition A: Full System (Vision + Coords) ---
HIGH : 7/14 (50.0% Recall) | Precision: 7/16 (43.8%)
MEDIUM: 8/25 (32.0% Recall) | Precision: 8/13 (61.5%)
LOW : 18/21 (85.7% Recall) | Precision: 18/31 (58.1%)
TOTAL : 33/60 (55.0% Overall Accuracy)
--- Condition B: Vision Only (No Coords) ---
HIGH : 8/14 (57.1% Recall) | Precision: 8/13 (61.5%)
MEDIUM: 10/25 (40.0% Recall) | Precision: 10/12 (83.3%)
LOW : 20/21 (95.2% Recall) | Precision: 20/35 (57.1%)
TOTAL : 38/60 (63.3% Overall Accuracy)
--- Condition C: Blind LLM (Gaussian Noise + Coords) ---
HIGH : 8/14 (57.1% Recall) | Precision: 8/41 (19.5%)
MEDIUM: 2/25 ( 8.0% Recall) | Precision: 2/ 4 (50.0%)
LOW : 7/21 (33.3% Recall) | Precision: 7/15 (46.7%)
TOTAL : 17/60 (28.3% Overall Accuracy)
--- Condition D: Sensor Conflict (Real Vision + Fake Coords) ---
Model trusted Vision (Correct) : 37/60 (61.7%)
Model trusted Coords (Failure) : 9/60 (15.0%)
Model got Confused (Neither) : 14/60 (23.3%)
Fine-tuning and quantization impact
| Condition | Base model | Fine-tuned (FP16) | Q4_K_M GGUF | ฮ (fine-tune) | ฮ (quantization) |
|---|---|---|---|---|---|
| A: Vision + GPS coords | 58.3% | 58.3% | 55.0% | 0 pp | โ3.3 pp |
| B: Vision only (no coords) | 60.0% | 65.0% | 63.3% | +5.0 pp | โ1.7 pp |
| C: Blind LLM (noise+coords) | 35.0% | 43.3% | 28.3% | +8.3 pp | โ15.0 pp |
Sensor conflict (Condition D): coordinate-trust failure drops from 20.0% (base) to 16.7% (fine-tuned FP16) to 15.0% (Q4_K_M GGUF). Quantization does not degrade GPS robustness.
Quantization impact on operational conditions (A and B): accuracy loss from Q4_K_M quantization is modest (โ3.3 pp and โ1.7 pp respectively), confirming that the deployed GGUF retains most of the fine-tuned model's capability. The large drop on Condition C (noise inputs) is not operationally relevant since the model never receives noise images in deployment.
Discussion
Fine-tuning produces measurable improvements on Conditions B, C, and D, but Condition A (the nominal operating condition with both image and GPS) shows no gain on this 360-target dataset. The most likely explanation is the breadth of the HIGH category: mega-ports, mega-airports, energy infrastructure, open-pit mines, and military facilities are all grouped into a single label. The model can learn to output the correct JSON format quickly (training loss drops to 0.18 in ~41 minutes), but 240 training images spread across five visually heterogeneous HIGH sub-types is not enough for the visual encoder to learn a reliable decision boundary.
This is a prototype demonstrating that on-board VLM inference on a Pi 5 is technically viable. The approach will improve significantly with:
- Narrower taxonomy: splitting HIGH into mission-specific sub-classes (e.g., ports only, or energy infrastructure only) and training a specialist adapter
- Larger corpus: 240 training images is a minimal dataset for a 3-class VLM task; 1,000-5,000 images per class is a more realistic target for robust generalization
- Higher-resolution tiles: 512ร512 Mapbox tiles lose fine-grained texture that distinguishes, e.g., a cargo terminal from a large parking lot at altitude
Deployment
The adapter is converted to Q4_K_M GGUF via llama-quantize and runs on the Pi 5 via llama.cpp's multimodal (mtmd) API:
Vision encoding (mtmd): ~10-15 s
Token generation (200 max): ~40-55 s
Total per frame: ~51-82 s (CPU only, Cortex-A76, mean ~69 s, 1,443 frames from 3 end-to-end runs)
See the quantization guide and deployment guide for full instructions.
Limitations
- Trained on Mapbox RGB tiles only; hence, no multispectral, SAR, or thermal data.
- 512ร512 pixel resolution matches the Pi 5 inference pipeline; different resolutions require re-cropping.
- Three-class taxonomy (HIGH / MEDIUM / LOW) is fixed at training time. Mission-specific priorities require fine-tuning on a new labeled dataset.
- Inference at 51-82 s/frame (mean ~69s across 1,443 frames from 3 end-to-end runs) sets a hard floor on capture interval: the auto-capture timer is 85s to avoid saturating the VLM queue, limiting throughput to ~24 frames per 35-min eclipse. Burst imaging, real-time video, or sub-minute revisit rates are not feasible without faster hardware (GPU/NPU) or a smaller model.
- Coordinate dropout improves GPS robustness but does not eliminate coord-biased errors on hard edge cases.
- Blank/missing tile hallucination: Mapbox returns blank white tiles at extreme latitudes (|lat| > 75ยฐ) where no satellite imagery exists. The model hallucinates strategic significance onto these featureless images (3 out of 8 HIGH classifications across 1,443 frames were blank tiles). These blank tiles are visually distinct from the ocean and ice sheet tiles in the training set. Mitigation: add blank/white tile detection before inference, or include polar blank tiles as explicit LOW training examples.
- Natural feature false positives: Coastlines, cloud cover, and geological formations (e.g., river deltas, glacial terrain) can be misclassified as HIGH due to visual similarity to trained HIGH morphologies (e.g., coastlines as "artificial formations," clouds as "volcanic eruptions"). The hard-negative training set mitigates some of this, but edge cases remain.
- Training data was generated at 500 km simulated altitude; Pi 5 runs used the SimSat TLE orbit at
802 km (0.7 Mapbox zoom levels difference). The model generalized across this mismatch without degradation, but accuracy may differ at significantly different altitudes.
- Downloads last month
- 46
4-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Saransh-cpp/orion-qlora-lfm2.5-vl-1.6b", filename="", )