Instructions to use LiquidAI/LFM2.5-VL-450M-Extract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2.5-VL-450M-Extract with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2.5-VL-450M-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-450M-Extract") model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2.5-VL-450M-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LiquidAI/LFM2.5-VL-450M-Extract with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-VL-450M-Extract" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-450M-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-VL-450M-Extract
- SGLang
How to use LiquidAI/LFM2.5-VL-450M-Extract with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-450M-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-450M-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-450M-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-450M-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use LiquidAI/LFM2.5-VL-450M-Extract with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-VL-450M-Extract
Eval pipeline (OpenRouter judge)
A self-contained evaluation pipeline for LFM2.5-VL structured-extraction models. Extraction runs on your local GPU (vLLM/HF); the VLM judge runs remotely via the OpenRouter API β no need to host a 30+ GB vision judge yourself.
Pipeline
WDS tars ββΆ Extraction (local GPU) ββΆ predictions
β
structural metrics βββββββββββββ€
(json validity, key P/R/F1) β
β
VLM judge (OpenRouter) βββββββββ
β
βΌ
eval_result.json
Three primary metrics per run: json_validity_rate, key_f1_macro,
vlm_judge_score_avg (per-key precision / recall also reported as
diagnostic byproducts of F1).
Files
.
βββ README.md
βββ requirements.txt
βββ run_eval.sh β entry script (env vars + python call)
βββ run_eval.py β CLI + orchestrator + metrics aggregation
βββ extract.py β WDS loader + vLLM/HF extraction + JSON parsing
βββ judge.py β OpenRouter async VLM judging
βββ prompts/ β 2 prompt templates (.txt)
βββ eval_data/ β shipped 2000-sample eval set (single WDS tar)
Three Python files total. No nested packages, no pyproject.toml,
no pip install -e . β just pip install -r requirements.txt.
Setup
1. Python environment
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install will pull vllm, torch, transformers, peft,
webdataset, pillow, openai, tqdm, numpy β ~5 GB total, takes
5β15 min depending on the network.
Mac / no NVIDIA GPU? vLLM won't install. Either drop the
vllmline fromrequirements.txt, or install everything else manually and run with--extraction-backend hf(forces the HF transformers path).
2. OpenRouter API key
Get a key from https://openrouter.ai/keys, then add it to your ~/.bashrc:
export OPENROUTER_API_KEY=sk-or-v1-...
Then source ~/.bashrc (or open a new shell).
Run
Quick start
bash run_eval.sh
Defaults:
- Evaluates
LiquidAI/LFM2.5-VL-450M-Extracton./eval_data/ - Runs the full 2000 samples (~30 min)
- VLM judge:
qwen/qwen3.5-35b-a3b - Writes results to
./eval_result.jsonand log to./eval_run.log
Tweaking knobs
Open run_eval.sh β every knob is a top-level variable with an inline
comment. Common changes:
NUM_SAMPLES=50 # set 50 for a quick smoke test (~5 min)
EXTRACTION_BACKEND="hf" # if vLLM init fails on your machine
EXTRACTION_BATCH=32 # bump for faster extraction (default 8)
VLM_JUDGE_MODEL="google/gemini-2.5-flash" # any image-capable OpenRouter model id
JUDGE_CONCURRENCY=8 # lower if you hit OpenRouter rate limits
CLI alternative
If you'd rather skip the .sh wrapper, drive run_eval.py directly:
python run_eval.py \
--checkpoint-path LiquidAI/LFM2.5-VL-450M-Extract \
--data-path ./eval_data \
--output-path ./eval_result.json \
--num-samples 50 \
--extraction-backend auto \
--vlm-judge --vlm-judge-model qwen/qwen3.5-35b-a3b
All flags: python run_eval.py --help
Eval data
What ships in ./eval_data/
2000 (image, schema, JSON) samples in a single WebDataset tar
(eval_set_n2000.tar). Reference labels were generated by an ensemble
of frontier multimodal models and lightly post-processed for consistency.
Bring your own
Drop a .tar (or directory of tars) anywhere and pass
--data-path /path/to/your/data.
Format spec
Each sample is a WebDataset group sharing a common <sample_id> prefix:
<sample_id>.jpg image bytes
<sample_id>.key_explanations JSON {key_name: description} (the schema)
<sample_id>.structured_text JSON {key_name: value} (ground truth)
Output
./eval_result.json has three top-level keys:
{
"metadata": {
"checkpoint_path": "LiquidAI/LFM2.5-VL-450M-Extract",
"num_samples_evaluated": 50,
"extraction_backend": "auto",
"vlm_judge_model": "qwen/qwen3.5-35b-a3b",
"elapsed_s": 215.2,
"timestamp_utc": "2026-05-29T..."
},
"metrics": {
"json_validity_rate": 0.996, // share of samples with parseable JSON
"key_precision_macro": 0.996, // pred-keys β© gt-keys / pred-keys
"key_recall_macro": 0.997,
"key_f1_macro": 0.997, // primary schema-consistency metric
"vlm_judge_score_avg": 0.922, // 0-1, VLM scoring of all keys vs image
"samples_evaluated": 50
},
"samples": [
/* per-sample {schema, gt, prediction, per_key scores, raw judge text} */
]
}
The samples[].vlm_judge_raw field preserves the judge's verbatim text
response β useful for debugging unexpected scores.
Costs
Default judge on a full 2000-sample run, calculated against per-token pricing at the time of writing (check https://openrouter.ai/models for current rates):
| Stage | Model | Input rate | Output rate | Est. cost |
|---|---|---|---|---|
| VLM judge | qwen/qwen3.5-35b-a3b |
$0.139 / 1M | $1.00 / 1M | ~$1.53 |
Full 2000-sample run: ~$1.50. Smoke 50-sample: ~$0.04.
Troubleshooting
- vLLM init fails (e.g.
Ninja build failed/__cudaLaunch not declared) β setEXTRACTION_BACKEND="hf"inrun_eval.shfor a slower-but-stable fallback. - OpenRouter 429 (rate limit) β lower
JUDGE_CONCURRENCYto 4 or 8. No usable samples loadedβ your tars don't have the expected<key>.jpg/.key_explanations/.structured_textfields, or the.tarpath is wrong.- A new judge model rejects with
Reasoning is mandatoryor returns all zero scores withfinish_reason=lengthβ edit the_VLM_JUDGE_REASONINGconstant injudge.py(the OpenRouterreasoningparam works differently per model).