Image-Text-to-Text
Transformers
Safetensors
English
lfm2_vl
liquid
lfm2.5
lfm2
edge
vision
conversational
Instructions to use LiquidAI/LFM2.5-VL-1.6B-Extract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-VL-1.6B-Extract" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
- SGLang
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
File size: 6,148 Bytes
21073aa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | # Eval pipeline (OpenRouter judge)
A self-contained evaluation pipeline for LFM2.5-VL structured-extraction
models. Extraction runs on your local GPU (vLLM/HF); the VLM judge runs
remotely via the [OpenRouter](https://openrouter.ai/) API β no need to
host a 30+ GB vision judge yourself.
## Pipeline
```
WDS tars ββΆ Extraction (local GPU) ββΆ predictions
β
structural metrics βββββββββββββ€
(json validity, key P/R/F1) β
β
VLM judge (OpenRouter) βββββββββ
β
βΌ
eval_result.json
```
Three primary metrics per run: `json_validity_rate`, `key_f1_macro`,
`vlm_judge_score_avg` (per-key precision / recall also reported as
diagnostic byproducts of F1).
## Files
```
.
βββ README.md
βββ requirements.txt
βββ run_eval.sh β entry script (env vars + python call)
βββ run_eval.py β CLI + orchestrator + metrics aggregation
βββ extract.py β WDS loader + vLLM/HF extraction + JSON parsing
βββ judge.py β OpenRouter async VLM judging
βββ prompts/ β 2 prompt templates (.txt)
βββ eval_data/ β shipped 2000-sample eval set (single WDS tar)
```
Three Python files total. No nested packages, no `pyproject.toml`,
no `pip install -e .` β just `pip install -r requirements.txt`.
---
## Setup
### 1. Python environment
```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
```
`pip install` will pull `vllm`, `torch`, `transformers`, `peft`,
`webdataset`, `pillow`, `openai`, `tqdm`, `numpy` β ~5 GB total, takes
5β15 min depending on the network.
> **Mac / no NVIDIA GPU?** vLLM won't install. Either drop the `vllm`
> line from `requirements.txt`, or install everything else manually and
> run with `--extraction-backend hf` (forces the HF transformers path).
### 2. OpenRouter API key
Get a key from https://openrouter.ai/keys, then add it to your `~/.bashrc`:
```bash
export OPENROUTER_API_KEY=sk-or-v1-...
```
Then `source ~/.bashrc` (or open a new shell).
---
## Run
### Quick start
```bash
bash run_eval.sh
```
Defaults:
- Evaluates `LiquidAI/LFM2.5-VL-1.6B-Extract` on `./eval_data/`
- Runs the full **2000 samples** (~30 min)
- VLM judge: `qwen/qwen3.5-35b-a3b`
- Writes results to `./eval_result.json` and log to `./eval_run.log`
### Tweaking knobs
Open `run_eval.sh` β every knob is a top-level variable with an inline
comment. Common changes:
```bash
NUM_SAMPLES=50 # set 50 for a quick smoke test (~5 min)
EXTRACTION_BACKEND="hf" # if vLLM init fails on your machine
EXTRACTION_BATCH=32 # bump for faster extraction (default 8)
VLM_JUDGE_MODEL="google/gemini-2.5-flash" # any image-capable OpenRouter model id
JUDGE_CONCURRENCY=8 # lower if you hit OpenRouter rate limits
```
### CLI alternative
If you'd rather skip the .sh wrapper, drive `run_eval.py` directly:
```bash
python run_eval.py \
--checkpoint-path LiquidAI/LFM2.5-VL-1.6B-Extract \
--data-path ./eval_data \
--output-path ./eval_result.json \
--num-samples 50 \
--extraction-backend auto \
--vlm-judge --vlm-judge-model qwen/qwen3.5-35b-a3b
```
All flags: `python run_eval.py --help`
---
## Eval data
### What ships in `./eval_data/`
2000 `(image, schema, JSON)` samples in a single WebDataset tar
(`eval_set_n2000.tar`). Reference labels were generated by an ensemble
of frontier multimodal models and lightly post-processed for consistency.
### Bring your own
Drop a `.tar` (or directory of tars) anywhere and pass
`--data-path /path/to/your/data`.
### Format spec
Each sample is a WebDataset group sharing a common `<sample_id>` prefix:
```
<sample_id>.jpg image bytes
<sample_id>.key_explanations JSON {key_name: description} (the schema)
<sample_id>.structured_text JSON {key_name: value} (ground truth)
```
---
## Output
`./eval_result.json` has three top-level keys:
```jsonc
{
"metadata": {
"checkpoint_path": "LiquidAI/LFM2.5-VL-1.6B-Extract",
"num_samples_evaluated": 50,
"extraction_backend": "auto",
"vlm_judge_model": "qwen/qwen3.5-35b-a3b",
"elapsed_s": 215.2,
"timestamp_utc": "2026-05-29T..."
},
"metrics": {
"json_validity_rate": 0.996, // share of samples with parseable JSON
"key_precision_macro": 0.996, // pred-keys β© gt-keys / pred-keys
"key_recall_macro": 0.997,
"key_f1_macro": 0.997, // primary schema-consistency metric
"vlm_judge_score_avg": 0.922, // 0-1, VLM scoring of all keys vs image
"samples_evaluated": 50
},
"samples": [
/* per-sample {schema, gt, prediction, per_key scores, raw judge text} */
]
}
```
The `samples[].vlm_judge_raw` field preserves the judge's verbatim text
response β useful for debugging unexpected scores.
---
## Costs
Default judge on a full 2000-sample run, calculated against per-token
pricing at the time of writing (check https://openrouter.ai/models for
current rates):
| Stage | Model | Input rate | Output rate | Est. cost |
|---|---|---|---|---|
| VLM judge | `qwen/qwen3.5-35b-a3b` | $0.139 / 1M | $1.00 / 1M | ~$1.53 |
**Full 2000-sample run: ~$1.50.** Smoke 50-sample: ~$0.04.
---
## Troubleshooting
- **vLLM init fails** (e.g. `Ninja build failed` / `__cudaLaunch not declared`)
β set `EXTRACTION_BACKEND="hf"` in `run_eval.sh` for a slower-but-stable
fallback.
- **OpenRouter 429 (rate limit)** β lower `JUDGE_CONCURRENCY` to 4 or 8.
- **`No usable samples loaded`** β your tars don't have the expected
`<key>.jpg` / `.key_explanations` / `.structured_text` fields, or the
`.tar` path is wrong.
- **A new judge model rejects with `Reasoning is mandatory`** or returns all
zero scores with `finish_reason=length` β edit the `_VLM_JUDGE_REASONING`
constant in `judge.py` (the OpenRouter `reasoning` param works differently
per model).
|