Instructions to use LiquidAI/LFM2.5-VL-450M-Extract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LiquidAI/LFM2.5-VL-450M-Extract with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2.5-VL-450M-Extract")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-450M-Extract")
model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2.5-VL-450M-Extract")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LiquidAI/LFM2.5-VL-450M-Extract with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2.5-VL-450M-Extract"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-VL-450M-Extract",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2.5-VL-450M-Extract

SGLang

How to use LiquidAI/LFM2.5-VL-450M-Extract with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LiquidAI/LFM2.5-VL-450M-Extract" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-VL-450M-Extract",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LiquidAI/LFM2.5-VL-450M-Extract" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-VL-450M-Extract",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use LiquidAI/LFM2.5-VL-450M-Extract with Docker Model Runner:
```
docker model run hf.co/LiquidAI/LFM2.5-VL-450M-Extract
```

LFM2.5-VL-450M-Extract / README.md

zhkleciel

Update README.md

7d5a831 verified 2 days ago

preview code

raw

history blame contribute delete

8.62 kB

	---
	library_name: transformers
	license: other
	license_name: lfm1.0
	license_link: LICENSE
	language:
	- en
	pipeline_tag: image-text-to-text
	tags:
	- liquid
	- lfm2.5
	- lfm2
	- edge
	- vision
	base_model: LiquidAI/LFM2.5-VL-450M
	---
	<center>
	<div style="text-align: center;">
	<img
	src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2b08LKpev0DNEk6DlnWkY.png"
	alt="Liquid AI"
	style="width: 100%; max-width: 100%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
	/>
	</div>
	<div style="display: flex; justify-content: center; gap: 0.5em;">
	<a href="https://playground.liquid.ai/chat?model=lfm2.5-vl-450m"><strong>Try LFM</strong></a> • <a href="https://docs.liquid.ai/lfm/getting-started/welcome"><strong>Docs</strong></a> • <a href="https://leap.liquid.ai/"><strong>LEAP</strong></a> • <a href="https://discord.com/invite/liquid-ai"><strong>Discord</strong></a>
	</div>
	</center>

	<br>

	# LFM2.5-VL-450M-Extract

	LFM2.5-VL-450M-Extract extracts user-defined fields from images and returns them as JSON. It is Liquid AI's first vision model in the [Liquid Nanos](https://huggingface.co/collections/LiquidAI/liquid-nanos) collection—compact, task-specific models built for production workflows—and extends the Extract family alongside [LFM2-350M-Extract](https://huggingface.co/LiquidAI/LFM2-350M-Extract) for text documents.

	## ⚙️ How it works

	You specify what to extract as a YAML field list in the system prompt, and the model returns a JSON object with those fields. Structured outputs integrate cleanly with rule-based systems and downstream pipelines. Use it out of the box or fine-tune for domain-specific extraction.

	- System prompt:

	```yaml
	wood_color: The overall coloration of the wood surface
	wood_texture: The tactile quality of the wood surface
	wood_pattern: The partern types visible on the wood surface
	```

	- User prompt:
	<img src="https://huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract/resolve/main/sample_image.png" width="300">

	- Output:
	```yaml
	{
	"wood_color": "light to medium brown",
	"wood_texture": "smooth with visible grain",
	"wood_pattern": "parallel, irregular, wavy"
	}
	```

	Our model supports the enum feature, which lets you provide a list of possible choices alongside the field description as follows, and the model will return one of the listed values as its answer.

	- System prompt:

	```yaml
	wood_color: The overall coloration of the wood surface, such as blue, red, or light tan
	wood_texture: The tactile quality of the wood surface, select from smooth, rough, or grainy
	wood_pattern: The partern types visible on the wood surface, e.g., straight, wavy, or curly
	```

	## 🌟 Use cases

	- Detecting safety-critical events in images (e.g. fallen person, fire, leakage) to trigger automated safety systems.
	- Collecting statistical information about objects across video frames for analytics pipelines.
	- Auto-tag product images with structured attributes for Retail/E-commerce.

	## 📄 Model details

	\| Property \| Detail \|
	\|---\|---:\|
	\| Parameters (LM only) \| 350M \|
	\| Vision encoder \| SigLIP2 (~100M, [SigLIP-2 paper](https://arxiv.org/abs/2502.14786)) \|
	\| Backbone layers \| hybrid conv+attention \|
	\| Image input \| Single image, dynamic resolution \|
	\| Context \| 128,000 tokens \|
	\| Vocab size \| 65,536 (text) \|
	\| Precision \| bfloat16 \|
	\| License \| LFM Open License v1.0 \|

	## 📊 Performance

	We evaluated LFM2.5-VL-450M-Extract on a 2,000-sample benchmark of
	`(image, schema, JSON)` triples, with reference labels generated by an
	ensemble of frontier multimodal models. Predictions are scored on the
	following three dimensions:

	- JSON Validity — share of samples producing strict-parseable JSON
	- Schema Consistency F1 Score — set-level F1 over predicted vs requested field names, macro-averaged across samples
	- VLM Judge Score — match against the image directly, judged by a separate vision model ([Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B))

	<img src="https://huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract/resolve/main/lfm2_vl_450m_metrics.png" width="800">

	\| Model \| Params \| JSON Validity \| F1 Score \| VLM Judge Score \|
	\|---\|---:\|---:\|---:\|---:\|
	\| LFM2.5-VL-450M-Extract \| 0.45B \| 98.9 \| 98.8 \| 84.5 \|
	\| LFM2.5-VL-450M \| 0.45B \| 97.7 \| 93.5 \| 73.4 \|
	\| SmolVLM-500M-Instruct \| 0.51B \| 33.0 \| 26.6 \| 12.2 \|
	\| FastVLM-0.5B \| 0.76B \| 22.5 \| 19.3 \| 16.3 \|
	\| Qwen3.5-0.8B \| 0.87B \| 96.4 \| 96.3 \| 82.3 \|
	\| InternVL3_5-1B \| 1.06B \| 98.0 \| 96.5 \| 80.7 \|
	\| MiniCPM-V-4.6 \| 1.30B \| 61.8 \| 60.4 \| 57.5 \|
	\| (ref) InternVL3_5-2B \| 2.35B \| 99.6 \| 99.2 \| 87.7 \|
	\| (ref) Qwen3.5-2B \| 2.27B \| 97.9 \| 97.7 \| 89.7 \|
	\| (ref) gemma-4-E2B-it \| 2.3B \| 97.4 \| 97.1 \| 84.4 \|

	LFM2-VL-450M-Extract outperforms similarly-sized (sub-1B) open-source VLMs on this benchmark and is competitive with models 4× its size.

	Reproducing these numbers: The full evaluation pipeline, which includes extraction, VLM judging, and metric aggregation, is bundled in this repository under `model_eval/`. Setup, configuration, and run instructions are in the folder's [`README`](./model_eval/README.md).

	Scope: These numbers characterize the model on the input/output form it is designed for: a single input image, a YAML field list as the schema, and a flat JSON object as the output. Performance is not expected to transfer to largely different tasks, e.g. multi-image reasoning or free-form VQA.

	<!-- > Generic instruction-tuned VLMs (SmolVLM, moondream) cannot perform
	> schema-based extraction zero-shot regardless of prompt strategy.
	> Under the most permissive prompt setups, they either:
	> - produce free-form captions ignoring the JSON instruction, or
	> - produce valid-shaped JSON but echo the schema descriptions or
	> few-shot example values as field values (zero faithfulness to
	> the image).
	>
	> LFM2-VL-Extract's task-specific training is what enables strict-JSON
	> output with faithful, image-grounded values in a single zero-shot
	> call — no few-shot examples, no grammar constraints, no inference
	> wrappers. -->


	The full evaluation pipeline, which includes extraction, LLM/VLM judging, and
	metric aggregation, is included in this repository under `model_eval/`. Usage details are in the folder's README.

	## 🏃 How to run

	You can run LFM2.5-VL-450M-Extract with Hugging Face [`transformers`](https://github.com/huggingface/transformers) v5.1 or newer:

	```bash
	pip install transformers pillow
	```

	```python
	from transformers import AutoProcessor, AutoModelForImageTextToText
	from transformers.image_utils import load_image

	model_id = "LiquidAI/LFM2.5-VL-450M-Extract"
	model = AutoModelForImageTextToText.from_pretrained(
	model_id,
	device_map="auto",
	dtype="bfloat16",
	trust_remote_code=True,
	)
	processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

	image = load_image("https://huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract/resolve/main/sample_image.png")

	fields_yaml = """wood_color: The overall coloration of the wood surface
	wood_texture: The tactile quality of the wood surface
	wood_pattern: The pattern types visible on the wood surface"""

	system_prompt = f"""Extract the following from the image:

	{fields_yaml}

	Respond with only a JSON object. Do not include any text outside the JSON."""

	conversation = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": [{"type": "image", "image": image}]},
	]

	inputs = processor.apply_chat_template(
	conversation,
	add_generation_prompt=True,
	return_tensors="pt",
	return_dict=True,
	tokenize=True,
	).to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
	response = processor.batch_decode(
	outputs[:, inputs["input_ids"].shape[1]:],
	skip_special_tokens=True,
	)[0]
	print(response)
	# {
	# "wood_color": "light to medium brown",
	# "wood_texture": "smooth with visible grain",
	# "wood_pattern": "parallel, irregular, wavy"
	# }
	```

	> [!WARNING]
	> The model is intended for single-turn conversations. We recommend using greedy decoding (`temperature=0`).

	## 📬 Contact

	- Got questions or want to connect? [Join our Discord community](https://discord.com/invite/liquid-ai)
	- If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact).

	## Citation

	```bibtex
	@article{liquidai2025lfm2,
	title={LFM2 Technical Report},
	author={Liquid AI},
	journal={arXiv preprint arXiv:2511.23404},
	year={2025}
	}
	```