Image-Text-to-Text
Transformers
Safetensors
English
lfm2_vl
liquid
lfm2.5
lfm2
edge
vision
conversational
Instructions to use LiquidAI/LFM2.5-VL-1.6B-Extract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-VL-1.6B-Extract" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
- SGLang
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
File size: 2,822 Bytes
21073aa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | #!/bin/bash
# Example evaluation script — extraction on local GPU, judges via OpenRouter API.
# bash run_eval.sh
set -euo pipefail
# --- vLLM env workarounds ----------------------------------------------------
# Needed when EXTRACTION_BACKEND=vllm or "auto". No-op on systems without
# vLLM / environment-modules.
# 1. CUDA toolkit on LD_LIBRARY_PATH so flashinfer's GDN/Mamba kernels can
# dlopen libcudart.so.12 (LFM2.5-VL is a hybrid arch and uses these).
# 2. Skip flashinfer's sampling kernel — flashinfer 0.6 + CUDA 12.9 trigger
# an NVCC stub bug (`__cudaLaunch` not declared). PyTorch-native sampler
# is ~20% slower but works.
# 3. Skip vLLM 0.21's DeepGEMM autotune warmup (~18 min for MoE/FP8 models).
module load cuda12.9/toolkit/12.9.1 2>/dev/null || true
export VLLM_USE_FLASHINFER_SAMPLER="${VLLM_USE_FLASHINFER_SAMPLER:-0}"
export VLLM_USE_DEEP_GEMM="${VLLM_USE_DEEP_GEMM:-0}"
# --- OpenRouter API key ------------------------------------------------------
# Required. Get one at https://openrouter.ai/keys, then either:
# export OPENROUTER_API_KEY=...
# in your shell, OR uncomment and set it here.
#OPENROUTER_API_KEY="sk-or-v1-..."
# --- Checkpoint --------------------------------------------------------------
# HF id of the trained model, OR a local merged/LoRA checkpoint dir.
CHECKPOINT="LiquidAI/LFM2.5-VL-1.6B-Extract"
# --- Eval data ---------------------------------------------------------------
# WDS tar / dir of tars / brace-glob.
DATA_PATH="./eval_data"
# Output JSON path.
OUTPUT="./eval_result.json"
# --- Sample count ------------------------------------------------------------
# Number of samples to evaluate. Default 2000 runs the full shipped eval_data
# (~30 min). Set to 50 for a quick smoke test (~5 min).
NUM_SAMPLES=2000
# --- Extraction (local GPU) --------------------------------------------------
# "auto" tries vLLM first, falls back to HF transformers on init failure.
EXTRACTION_BACKEND="auto"
EXTRACTION_BATCH=8
# --- Judge model (OpenRouter) ------------------------------------------------
# Any image-capable OpenRouter model id works. Pricing:
# https://openrouter.ai/models
VLM_JUDGE_MODEL="qwen/qwen3.5-35b-a3b"
# Concurrent OpenRouter calls. Lower if you hit rate limits.
JUDGE_CONCURRENCY=16
# --- Run ---------------------------------------------------------------------
LOG_FILE="${LOG_FILE:-./eval_run.log}"
echo "Logging to: ${LOG_FILE}"
python run_eval.py \
--checkpoint-path "${CHECKPOINT}" \
--data-path "${DATA_PATH}" \
--output-path "${OUTPUT}" \
--num-samples "${NUM_SAMPLES}" \
--extraction-backend "${EXTRACTION_BACKEND}" \
--extraction-batch "${EXTRACTION_BATCH}" \
--vlm-judge --vlm-judge-model "${VLM_JUDGE_MODEL}" \
--judge-concurrency "${JUDGE_CONCURRENCY}" \
2>&1 | tee "${LOG_FILE}"
|