Image-Text-to-Text
Transformers
Safetensors
English
lfm2_vl
liquid
lfm2.5
lfm2
edge
vision
conversational
Instructions to use LiquidAI/LFM2.5-VL-1.6B-Extract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-VL-1.6B-Extract" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
- SGLang
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
| # Eval pipeline (OpenRouter judge) | |
| A self-contained evaluation pipeline for LFM2.5-VL structured-extraction | |
| models. Extraction runs on your local GPU (vLLM/HF); the VLM judge runs | |
| remotely via the [OpenRouter](https://openrouter.ai/) API — no need to | |
| host a 30+ GB vision judge yourself. | |
| ## Pipeline | |
| ``` | |
| WDS tars ─▶ Extraction (local GPU) ─▶ predictions | |
| │ | |
| structural metrics ◀───────────┤ | |
| (json validity, key P/R/F1) │ | |
| │ | |
| VLM judge (OpenRouter) ◀───────┘ | |
| │ | |
| ▼ | |
| eval_result.json | |
| ``` | |
| Three primary metrics per run: `json_validity_rate`, `key_f1_macro`, | |
| `vlm_judge_score_avg` (per-key precision / recall also reported as | |
| diagnostic byproducts of F1). | |
| ## Files | |
| ``` | |
| . | |
| ├── README.md | |
| ├── requirements.txt | |
| ├── run_eval.sh ← entry script (env vars + python call) | |
| ├── run_eval.py ← CLI + orchestrator + metrics aggregation | |
| ├── extract.py ← WDS loader + vLLM/HF extraction + JSON parsing | |
| ├── judge.py ← OpenRouter async VLM judging | |
| ├── prompts/ ← 2 prompt templates (.txt) | |
| └── eval_data/ ← shipped 2000-sample eval set (single WDS tar) | |
| ``` | |
| Three Python files total. No nested packages, no `pyproject.toml`, | |
| no `pip install -e .` — just `pip install -r requirements.txt`. | |
| --- | |
| ## Setup | |
| ### 1. Python environment | |
| ```bash | |
| python -m venv .venv && source .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| `pip install` will pull `vllm`, `torch`, `transformers`, `peft`, | |
| `webdataset`, `pillow`, `openai`, `tqdm`, `numpy` — ~5 GB total, takes | |
| 5–15 min depending on the network. | |
| > **Mac / no NVIDIA GPU?** vLLM won't install. Either drop the `vllm` | |
| > line from `requirements.txt`, or install everything else manually and | |
| > run with `--extraction-backend hf` (forces the HF transformers path). | |
| ### 2. OpenRouter API key | |
| Get a key from https://openrouter.ai/keys, then add it to your `~/.bashrc`: | |
| ```bash | |
| export OPENROUTER_API_KEY=sk-or-v1-... | |
| ``` | |
| Then `source ~/.bashrc` (or open a new shell). | |
| --- | |
| ## Run | |
| ### Quick start | |
| ```bash | |
| bash run_eval.sh | |
| ``` | |
| Defaults: | |
| - Evaluates `LiquidAI/LFM2.5-VL-1.6B-Extract` on `./eval_data/` | |
| - Runs the full **2000 samples** (~30 min) | |
| - VLM judge: `qwen/qwen3.5-35b-a3b` | |
| - Writes results to `./eval_result.json` and log to `./eval_run.log` | |
| ### Tweaking knobs | |
| Open `run_eval.sh` — every knob is a top-level variable with an inline | |
| comment. Common changes: | |
| ```bash | |
| NUM_SAMPLES=50 # set 50 for a quick smoke test (~5 min) | |
| EXTRACTION_BACKEND="hf" # if vLLM init fails on your machine | |
| EXTRACTION_BATCH=32 # bump for faster extraction (default 8) | |
| VLM_JUDGE_MODEL="google/gemini-2.5-flash" # any image-capable OpenRouter model id | |
| JUDGE_CONCURRENCY=8 # lower if you hit OpenRouter rate limits | |
| ``` | |
| ### CLI alternative | |
| If you'd rather skip the .sh wrapper, drive `run_eval.py` directly: | |
| ```bash | |
| python run_eval.py \ | |
| --checkpoint-path LiquidAI/LFM2.5-VL-1.6B-Extract \ | |
| --data-path ./eval_data \ | |
| --output-path ./eval_result.json \ | |
| --num-samples 50 \ | |
| --extraction-backend auto \ | |
| --vlm-judge --vlm-judge-model qwen/qwen3.5-35b-a3b | |
| ``` | |
| All flags: `python run_eval.py --help` | |
| --- | |
| ## Eval data | |
| ### What ships in `./eval_data/` | |
| 2000 `(image, schema, JSON)` samples in a single WebDataset tar | |
| (`eval_set_n2000.tar`). Reference labels were generated by an ensemble | |
| of frontier multimodal models and lightly post-processed for consistency. | |
| ### Bring your own | |
| Drop a `.tar` (or directory of tars) anywhere and pass | |
| `--data-path /path/to/your/data`. | |
| ### Format spec | |
| Each sample is a WebDataset group sharing a common `<sample_id>` prefix: | |
| ``` | |
| <sample_id>.jpg image bytes | |
| <sample_id>.key_explanations JSON {key_name: description} (the schema) | |
| <sample_id>.structured_text JSON {key_name: value} (ground truth) | |
| ``` | |
| --- | |
| ## Output | |
| `./eval_result.json` has three top-level keys: | |
| ```jsonc | |
| { | |
| "metadata": { | |
| "checkpoint_path": "LiquidAI/LFM2.5-VL-1.6B-Extract", | |
| "num_samples_evaluated": 50, | |
| "extraction_backend": "auto", | |
| "vlm_judge_model": "qwen/qwen3.5-35b-a3b", | |
| "elapsed_s": 215.2, | |
| "timestamp_utc": "2026-05-29T..." | |
| }, | |
| "metrics": { | |
| "json_validity_rate": 0.996, // share of samples with parseable JSON | |
| "key_precision_macro": 0.996, // pred-keys ∩ gt-keys / pred-keys | |
| "key_recall_macro": 0.997, | |
| "key_f1_macro": 0.997, // primary schema-consistency metric | |
| "vlm_judge_score_avg": 0.922, // 0-1, VLM scoring of all keys vs image | |
| "samples_evaluated": 50 | |
| }, | |
| "samples": [ | |
| /* per-sample {schema, gt, prediction, per_key scores, raw judge text} */ | |
| ] | |
| } | |
| ``` | |
| The `samples[].vlm_judge_raw` field preserves the judge's verbatim text | |
| response — useful for debugging unexpected scores. | |
| --- | |
| ## Costs | |
| Default judge on a full 2000-sample run, calculated against per-token | |
| pricing at the time of writing (check https://openrouter.ai/models for | |
| current rates): | |
| | Stage | Model | Input rate | Output rate | Est. cost | | |
| |---|---|---|---|---| | |
| | VLM judge | `qwen/qwen3.5-35b-a3b` | $0.139 / 1M | $1.00 / 1M | ~$1.53 | | |
| **Full 2000-sample run: ~$1.50.** Smoke 50-sample: ~$0.04. | |
| --- | |
| ## Troubleshooting | |
| - **vLLM init fails** (e.g. `Ninja build failed` / `__cudaLaunch not declared`) | |
| → set `EXTRACTION_BACKEND="hf"` in `run_eval.sh` for a slower-but-stable | |
| fallback. | |
| - **OpenRouter 429 (rate limit)** → lower `JUDGE_CONCURRENCY` to 4 or 8. | |
| - **`No usable samples loaded`** → your tars don't have the expected | |
| `<key>.jpg` / `.key_explanations` / `.structured_text` fields, or the | |
| `.tar` path is wrong. | |
| - **A new judge model rejects with `Reasoning is mandatory`** or returns all | |
| zero scores with `finish_reason=length` → edit the `_VLM_JUDGE_REASONING` | |
| constant in `judge.py` (the OpenRouter `reasoning` param works differently | |
| per model). | |