Image-Text-to-Text
Transformers
Safetensors
English
lfm2_vl
liquid
lfm2.5
lfm2
edge
vision
conversational
Instructions to use LiquidAI/LFM2.5-VL-1.6B-Extract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-VL-1.6B-Extract" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
- SGLang
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
File size: 7,801 Bytes
21073aa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | ---
library_name: transformers
license: other
license_name: lfm1.0
license_link: LICENSE
language:
- en
pipeline_tag: image-text-to-text
tags:
- liquid
- lfm2.5
- lfm2
- edge
- vision
base_model: LiquidAI/LFM2.5-VL-1.6B
---
<center>
<div style="text-align: center;">
<img
src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2b08LKpev0DNEk6DlnWkY.png"
alt="Liquid AI"
style="width: 100%; max-width: 100%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
/>
</div>
<div style="display: flex; justify-content: center; gap: 0.5em;">
<a href="https://playground.liquid.ai/chat?model=lfm2.5-vl-1.6b"><strong>Try LFM</strong></a> • <a href="https://docs.liquid.ai/lfm/getting-started/welcome"><strong>Docs</strong></a> • <a href="https://leap.liquid.ai/"><strong>LEAP</strong></a> • <a href="https://discord.com/invite/liquid-ai"><strong>Discord</strong></a>
</div>
</center>
<br>
# LFM2.5-VL-1.6B-Extract
**LFM2.5-VL-1.6B-Extract** extracts user-defined fields from images and returns them as **JSON**. It is Liquid AI's first vision model in the [Liquid Nanos](https://huggingface.co/collections/LiquidAI/liquid-nanos) collection—compact, task-specific models built for production workflows—and extends the Extract family alongside [LFM2-1.2B-Extract](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract) for text documents.
## ⚙️ How it works
You specify what to extract as a YAML field list in the system prompt, and the model returns a JSON object with those fields. Structured outputs integrate cleanly with rule-based systems and downstream pipelines. Use it out of the box or fine-tune for domain-specific extraction.
- **System prompt**:
```yaml
wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The partern types visible on the wood surface
```
- **User prompt**:
<img src="https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-Extract/resolve/main/sample_image.png" width="300">
- **Output**:
```yaml
{
"wood_color": "light tan to beige with darker brown streaks",
"wood_texture": "smooth with visible grain patterns",
"wood_pattern": "wavy, linear, irregular"
}
```
Our model supports the enum feature, which lets you provide a list of possible choices alongside the field description as follows, and the model will return one of the listed values as its answer.
- **System prompt**:
```yaml
wood_color: The overall coloration of the wood surface, such as blue, red, or light tan
wood_texture: The tactile quality of the wood surface, select from smooth, rough, or grainy
wood_pattern: The partern types visible on the wood surface, e.g., straight, wavy, or curly
```
## 🌟 Use cases
- Detecting safety-critical events in images (e.g. fallen person, fire, leakage) to trigger automated safety systems.
- Collecting statistical information about objects across video frames for analytics pipelines.
- Auto-tag product images with structured attributes for Retail/E-commerce.
## 📄 Model details
| Property | Detail |
|---|---:|
| **Parameters (LM only)** | 1.2B |
| **Vision encoder** | SigLIP2 (~400M, [SigLIP-2 paper](https://arxiv.org/abs/2502.14786)) |
| **Backbone layers** | hybrid conv+attention |
| **Image input** | Single image, dynamic resolution |
| **Context** | 128,000 tokens |
| **Vocab size** | 65,536 (text) |
| **Precision** | bfloat16 |
| **License** | LFM Open License v1.0 |
## 📊 Performance
We evaluated LFM2.5-VL-1.6B-Extract on a 2,000-sample benchmark of
`(image, schema, JSON)` triples, with reference labels generated by an
ensemble of frontier multimodal models. Predictions are scored on the
following three dimensions:
- **JSON Validity** — share of samples producing strict-parseable JSON
- **Schema Consistency F1 Score** — set-level F1 over predicted vs requested field names, macro-averaged across samples
- **VLM Judge Score** — match against the image directly, judged by a separate vision model ([Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B))
<img src="https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-Extract/resolve/main/lfm2_5_vl_1_6b_metrics.png" width="800">
| Model | Params | JSON Validity | F1 Score | VLM Judge Score |
|---|---:|---:|---:|---:|
| **LFM2.5-VL-1.6B-Extract** | **1.6B** | **99.6** | **99.6** | **90.6** |
| LFM2.5-VL-1.6B | 1.6B | 91.8 | 75.8 | 66.0 |
| FastVLM-1.5B | 1.91B | 87.3 | 80.3 | 50.9 |
| SmolVLM2-2.2B-Instruct | 2.25B | 84.4 | 82.9 | 64.8 |
| Qwen3.5-2B | 2.27B | 97.9 | 97.7 | 89.7 |
| gemma-4-E2B-it | 2.3B | 97.4 | 97.1 | 84.4 |
| InternVL3_5-2B | 2.35B | **99.6** | 99.2 | 87.7 |
| *(ref) Qwen3-VL-4B-Instruct* | 4.44B | 99.8 | 99.7 | 92.0 |
| *(ref) InternVL3_5-4B* | 4.73B | 99.5 | 99.4 | 90.2 |
LFM2.5-VL-1.6B-Extract outperforms similarly-sized (~2B) open-source VLMs on this benchmark and is competitive with models 2× its size.
**Reproducing these numbers**: The full evaluation pipeline, which includes extraction, VLM judging, and metric aggregation, is bundled in this repository under `model_eval/`. Setup, configuration, and run instructions are in the folder's [`README`](./model_eval/README.md).
**Scope**: These numbers characterize the model on the input/output form it is designed for: a single input image, a YAML field list as the schema, and a flat JSON object as the output. Performance is not expected to transfer to vastly different tasks, e.g. multi-image reasoning or free-form VQA.
## 🏃 How to run
You can run LFM2.5-VL-1.6B-Extract with Hugging Face [`transformers`](https://github.com/huggingface/transformers) v5.1 or newer:
```bash
pip install transformers pillow
```
```python
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image
model_id = "LiquidAI/LFM2.5-VL-1.6B-Extract"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
image = load_image("https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-Extract/resolve/main/sample_image.png")
fields_yaml = """wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface"""
system_prompt = f"""Extract the following from the image:
{fields_yaml}
Respond with only a JSON object. Do not include any text outside the JSON."""
conversation = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": [{"type": "image", "image": image}]},
]
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.batch_decode(
outputs[:, inputs["input_ids"].shape[1]:],
skip_special_tokens=True,
)[0]
print(response)
# {
# "wood_color": "light tan to beige with darker brown streaks",
# "wood_texture": "smooth with visible grain patterns",
# "wood_pattern": "wavy, linear, irregular"
# }
```
> [!WARNING]
> The model is intended for single-turn conversations. We recommend using greedy decoding (`temperature=0`).
## 📬 Contact
- Got questions or want to connect? [Join our Discord community](https://discord.com/invite/liquid-ai)
- If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact).
## Citation
```bibtex
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
```
|