Instructions to use LiquidAI/LFM2.5-VL-450M-Extract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LiquidAI/LFM2.5-VL-450M-Extract with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2.5-VL-450M-Extract")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-450M-Extract")
model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2.5-VL-450M-Extract")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LiquidAI/LFM2.5-VL-450M-Extract with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2.5-VL-450M-Extract"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-VL-450M-Extract",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2.5-VL-450M-Extract

SGLang

How to use LiquidAI/LFM2.5-VL-450M-Extract with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LiquidAI/LFM2.5-VL-450M-Extract" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-VL-450M-Extract",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LiquidAI/LFM2.5-VL-450M-Extract" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-VL-450M-Extract",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use LiquidAI/LFM2.5-VL-450M-Extract with Docker Model Runner:
```
docker model run hf.co/LiquidAI/LFM2.5-VL-450M-Extract
```

LFM2.5-VL-450M-Extract

File size: 8,622 Bytes

c013d55

---
library_name: transformers
license: other
license_name: lfm1.0
license_link: LICENSE
language:
- en
pipeline_tag: image-text-to-text
tags:
- liquid
- lfm2.5
- lfm2
- edge
- vision
base_model: LiquidAI/LFM2.5-VL-450M
---
<center>
<div style="text-align: center;">
  <img 
    src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2b08LKpev0DNEk6DlnWkY.png" 
    alt="Liquid AI"
    style="width: 100%; max-width: 100%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
  />
</div>
<div style="display: flex; justify-content: center; gap: 0.5em;">
<a href="https://playground.liquid.ai/chat?model=lfm2.5-vl-450m"><strong>Try LFM</strong></a> • <a href="https://docs.liquid.ai/lfm/getting-started/welcome"><strong>Docs</strong></a> • <a href="https://leap.liquid.ai/"><strong>LEAP</strong></a> • <a href="https://discord.com/invite/liquid-ai"><strong>Discord</strong></a>
</div>
</center>

<br>

# LFM2.5-VL-450M-Extract

**LFM2.5-VL-450M-Extract** extracts user-defined fields from images and returns them as **JSON**. It is Liquid AI's first vision model in the [Liquid Nanos](https://huggingface.co/collections/LiquidAI/liquid-nanos) collection—compact, task-specific models built for production workflows—and extends the Extract family alongside [LFM2-350M-Extract](https://huggingface.co/LiquidAI/LFM2-350M-Extract) for text documents.

## ⚙️ How it works

You specify what to extract as a YAML field list in the system prompt, and the model returns a JSON object with those fields. Structured outputs integrate cleanly with rule-based systems and downstream pipelines. Use it out of the box or fine-tune for domain-specific extraction.

- **System prompt**:

```yaml
wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface 
wood_pattern: The partern types visible on the wood surface
```

- **User prompt**:
<img src="https://huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract/resolve/main/sample_image.png" width="300">

- **Output**:
```yaml
{
  "wood_color": "light to medium brown",
  "wood_texture": "smooth with visible grain",
  "wood_pattern": "parallel, irregular, wavy"
}
```

Our model supports the enum feature, which lets you provide a list of possible choices alongside the field description as follows, and the model will return one of the listed values as its answer. 

- **System prompt**:

```yaml
wood_color: The overall coloration of the wood surface, such as blue, red, or light tan
wood_texture: The tactile quality of the wood surface, select from smooth, rough, or grainy
wood_pattern: The partern types visible on the wood surface, e.g., straight, wavy, or curly
```

## 🌟 Use cases

- Detecting safety-critical events in images (e.g. fallen person, fire, leakage) to trigger automated safety systems.
- Collecting statistical information about objects across video frames for analytics pipelines.
- Auto-tag product images with structured attributes for Retail/E-commerce.

## 📄 Model details

| Property | Detail |
|---|---:|
| **Parameters (LM only)** | 350M |
| **Vision encoder** | SigLIP2 (~100M, [SigLIP-2 paper](https://arxiv.org/abs/2502.14786)) |
| **Backbone layers** | hybrid conv+attention |
| **Image input** | Single image, dynamic resolution |
| **Context** | 128,000 tokens |
| **Vocab size** | 65,536 (text) |
| **Precision** | bfloat16 |
| **License** | LFM Open License v1.0 |

## 📊 Performance

We evaluated LFM2.5-VL-450M-Extract on a 2,000-sample benchmark of
`(image, schema, JSON)` triples, with reference labels generated by an
ensemble of frontier multimodal models. Predictions are scored on the
following three dimensions:

- **JSON Validity** — share of samples producing strict-parseable JSON
- **Schema Consistency F1 Score** — set-level F1 over predicted vs requested field names, macro-averaged across samples
- **VLM Judge Score** — match against the image directly, judged by a separate vision model ([Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B))

<img src="https://huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract/resolve/main/lfm2_vl_450m_metrics.png" width="800">

| Model | Params | JSON Validity | F1 Score | VLM Judge Score |
|---|---:|---:|---:|---:|
| **LFM2.5-VL-450M-Extract** | **0.45B** | **98.9** | **98.8** | **84.5** |
| LFM2.5-VL-450M | 0.45B | 97.7 | 93.5 | 73.4 |
| SmolVLM-500M-Instruct | 0.51B | 33.0 | 26.6 | 12.2 |
| FastVLM-0.5B | 0.76B | 22.5 | 19.3 | 16.3 |
| Qwen3.5-0.8B | 0.87B | 96.4 | 96.3 | 82.3 |
| InternVL3_5-1B | 1.06B | 98.0 | 96.5 | 80.7 |
| MiniCPM-V-4.6 | 1.30B | 61.8 | 60.4 | 57.5 |
| *(ref) InternVL3_5-2B* | 2.35B | 99.6 | 99.2 | 87.7 |
| *(ref) Qwen3.5-2B* | 2.27B | 97.9 | 97.7 | 89.7 |
| *(ref) gemma-4-E2B-it* | 2.3B | 97.4 | 97.1 | 84.4 |

LFM2-VL-450M-Extract outperforms similarly-sized (sub-1B) open-source VLMs on this benchmark and is competitive with models 4× its size.

**Reproducing these numbers**: The full evaluation pipeline, which includes extraction, VLM judging, and metric aggregation, is bundled in this repository under `model_eval/`. Setup, configuration, and run instructions are in the folder's [`README`](./model_eval/README.md).

**Scope**: These numbers characterize the model on the input/output form it is designed for: a single input image, a YAML field list as the schema, and a flat JSON object as the output. Performance is not expected to transfer to largely different tasks, e.g. multi-image reasoning or free-form VQA.

<!-- > Generic instruction-tuned VLMs (SmolVLM, moondream) cannot perform
> schema-based extraction zero-shot regardless of prompt strategy.
> Under the most permissive prompt setups, they either:
>   - produce free-form captions ignoring the JSON instruction, or
>   - produce valid-shaped JSON but echo the schema descriptions or
>     few-shot example values as field values (zero faithfulness to
>     the image).
>
> LFM2-VL-Extract's task-specific training is what enables strict-JSON
> output with faithful, image-grounded values in a single zero-shot
> call — no few-shot examples, no grammar constraints, no inference
> wrappers. -->


The full evaluation pipeline, which includes extraction, LLM/VLM judging, and
metric aggregation, is included in this repository under `model_eval/`. Usage details are in the folder's README.

## 🏃 How to run

You can run LFM2.5-VL-450M-Extract with Hugging Face [`transformers`](https://github.com/huggingface/transformers) v5.1 or newer:

```bash
pip install transformers pillow
```

```python
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

model_id = "LiquidAI/LFM2.5-VL-450M-Extract"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

image = load_image("https://huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract/resolve/main/sample_image.png")

fields_yaml = """wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface"""

system_prompt = f"""Extract the following from the image:

{fields_yaml}

Respond with only a JSON object. Do not include any text outside the JSON."""

conversation = [
    {"role": "system", "content": system_prompt},
    {"role": "user",   "content": [{"type": "image", "image": image}]},
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.batch_decode(
    outputs[:, inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
)[0]
print(response)
# {
#   "wood_color": "light to medium brown",
#   "wood_texture": "smooth with visible grain",
#   "wood_pattern": "parallel, irregular, wavy"
# }
```

> [!WARNING]
> The model is intended for single-turn conversations. We recommend using greedy decoding (`temperature=0`).

## 📬 Contact

- Got questions or want to connect? [Join our Discord community](https://discord.com/invite/liquid-ai)
- If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact).

## Citation

```bibtex
@article{liquidai2025lfm2,
 title={LFM2 Technical Report},
 author={Liquid AI},
 journal={arXiv preprint arXiv:2511.23404},
 year={2025}
}
```