Instructions to use LiquidAI/LFM2-VL-1.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LiquidAI/LFM2-VL-1.6B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2-VL-1.6B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("LiquidAI/LFM2-VL-1.6B")
model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2-VL-1.6B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LiquidAI/LFM2-VL-1.6B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2-VL-1.6B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-VL-1.6B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2-VL-1.6B

SGLang

How to use LiquidAI/LFM2-VL-1.6B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LiquidAI/LFM2-VL-1.6B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-VL-1.6B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LiquidAI/LFM2-VL-1.6B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-VL-1.6B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use LiquidAI/LFM2-VL-1.6B with Docker Model Runner:
```
docker model run hf.co/LiquidAI/LFM2-VL-1.6B
```

Update README.md

by Alfaxad - opened Oct 12, 2025

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

+210

-93

Files changed (1) hide show

README.md +210 -93

README.md CHANGED Viewed

@@ -4,105 +4,99 @@ license: other
 license_name: lfm1.0
 license_link: LICENSE
 language:
-- en
-pipeline_tag: image-text-to-text
 tags:
 - liquid
 - lfm2
 - lfm2-vl
-- edge
 ---
-<center>
-<div style="text-align: center;">
-  <img
-    src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png"
-    alt="Liquid AI"
-    style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
-  />
-</div>
-</center>
-# LFM2‑VL
-LFM2‑VL is [Liquid AI](https://www.liquid.ai/)'s first series of multimodal models, designed to process text and images with variable resolutions.
-Built on the [LFM2](https://huggingface.co/collections/LiquidAI/lfm2-686d721927015b2ad73eaa38) backbone, it is optimized for low-latency and edge AI applications.
-We're releasing the weights of two post-trained checkpoints with [450M](https://huggingface.co/LiquidAI/LFM2-VL-450M) (for highly constrained devices) and [1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) (more capable yet still lightweight) parameters.
-* **2× faster inference speed** on GPUs compared to existing VLMs while maintaining competitive accuracy
-* **Flexible architecture** with user-tunable speed-quality tradeoffs at inference time
-* **Native resolution processing** up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion
-Find more about our vision-language model in the [LFM2-VL post](https://www.liquid.ai/blog/lfm2-vl-efficient-vision-language-models) and its language backbone in the [LFM2 blog post](https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models).
-## 📄 Model details
-Due to their small size, **we recommend fine-tuning LFM2-VL models on narrow use cases** to maximize performance.
-They were trained for instruction following and lightweight agentic flows.
-Not intended for safety‑critical decisions.
-| Property | [**LFM2-VL-450M**](https://huggingface.co/LiquidAI/LFM2-VL-450M) | [**LFM2-VL-1.6B**](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) |
-|---|---:|---:|
-| **Parameters (LM only)** | 350M | 1.2B |
-| **Vision encoder** | SigLIP2 NaFlex base (86M) | SigLIP2 NaFlex shape‑optimized (400M) |
-| **Backbone layers** | hybrid conv+attention | hybrid conv+attention |
-| **Context (text)** | 32,768 tokens | 32,768 tokens |
-| **Image tokens** | dynamic, user‑tunable | dynamic, user‑tunable |
-| **Vocab size** | 65,536 | 65,536 |
-| **Precision** | bfloat16 | bfloat16 |
-| **License** | LFM Open License v1.0 | LFM Open License v1.0 |
-**Supported languages:** English
-**Generation parameters**: We recommend the following parameters:
-- Text: `temperature=0.1`, `min_p=0.15`, `repetition_penalty=1.05`
-- Vision: `min_image_tokens=64` `max_image_tokens=256`, `do_image_splitting=True`
-**Chat template**: LFM2-VL uses a ChatML-like chat template as follows:
-```
-<|startoftext|><|im_start|>system
-You are a helpful multimodal assistant by Liquid AI.<|im_end|>
-<|im_start|>user
-<image>Describe this image.<|im_end|>
-<|im_start|>assistant
-This image shows a Caenorhabditis elegans (C. elegans) nematode.<|im_end|>
-```
-Images are referenced with a sentinel (`<image>`), which is automatically replaced with the image tokens by the processor.
-You can apply it using the dedicated [`.apply_chat_template()`](https://huggingface.co/docs/transformers/en/chat_templating#applychattemplate) function from Hugging Face transformers.
-**Architecture**
-- **Hybrid backbone**: Language model tower (LFM2-1.2B or LFM2-350M) paired with SigLIP2 NaFlex vision encoders (400M shape-optimized or 86M base variant)
-- **Native resolution processing**: Handles images up to 512×512 pixels without upscaling and preserves non-standard aspect ratios without distortion
-- **Tiling strategy**: Splits large images into non-overlapping 512×512 patches and includes thumbnail encoding for global context (in 1.6B model)
-- **Efficient token mapping**: 2-layer MLP connector with pixel unshuffle reduces image tokens (e.g., 256×384 image → 96 tokens, 1000×3000 → 1,020 tokens)
-- **Inference-time flexibility**: User-tunable maximum image tokens and patch count for speed/quality tradeoff without retraining
-**Training approach**
-- Builds on the LFM2 base model with joint mid-training that fuses vision and language capabilities using a gradually adjusted text-to-image ratio
-- Applies joint SFT with emphasis on image understanding and vision tasks
-- Leverages large-scale open-source datasets combined with in-house synthetic vision data, selected for balanced task coverage
-- Follows a progressive training strategy: base model → joint mid-training → supervised fine-tuning
-## 🏃 How to run LFM2-VL
-You can run LFM2-VL with Hugging Face [`transformers`](https://github.com/huggingface/transformers) v4.55 or more recent as follows:
 ```bash
 pip install -U transformers pillow
 ```
-Here is an example of how to generate an answer with transformers in Python:
 ```python
 from transformers import AutoProcessor, AutoModelForImageTextToText
 from transformers.image_utils import load_image
 # Load model and processor
-model_id = "LiquidAI/LFM2-VL-1.6B"
 model = AutoModelForImageTextToText.from_pretrained(
     model_id,
     device_map="auto",
@@ -111,20 +105,19 @@ model = AutoModelForImageTextToText.from_pretrained(
 )
 processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
-# Load image and create conversation
-url = "https://www.ilankelman.org/stopsigns/australia.jpg"
-image = load_image(url)
 conversation = [
     {
         "role": "user",
         "content": [
             {"type": "image", "image": image},
-            {"type": "text", "text": "What is in this image?"},
         ],
     },
 ]
-# Generate Answer
 inputs = processor.apply_chat_template(
     conversation,
     add_generation_prompt=True,
@@ -132,40 +125,164 @@ inputs = processor.apply_chat_template(
     return_dict=True,
     tokenize=True,
 ).to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=64)
-processor.batch_decode(outputs, skip_special_tokens=True)[0]
-# This image depicts a vibrant street scene in what appears to be a Chinatown or similar cultural area. The focal point is a large red stop sign with white lettering, mounted on a pole.
 ```
-You can directly run and test the model with this [Colab notebook](https://colab.research.google.com/drive/11EMJhcVB6OTEuv--OePyGK86k-38WU3q?usp=sharing).
-## 🔧 How to fine-tune
-We recommend fine-tuning LFM2-VL models on your use cases to maximize performance.
-| Notebook  | Description                                                          | Link |
-|-----------|----------------------------------------------------------------------|------|
-| SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | <a href="https://colab.research.google.com/drive/1csXCLwJx7wI7aruudBp6ZIcnqfv8EMYN?usp=sharing"><img src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png" width="110" alt="Colab link"></a> |
-## 📈 Performance
-| Model             | RealWorldQA | MM-IFEval | InfoVQA (Val) | OCRBench | BLINK | MMStar | MMMU (Val) | MathVista | SEEDBench_IMG | MMVet | MME      | MMLU  |
-|-------------------|-------------|-----------|---------------|----------|-------|--------|------------|-----------|---------------|-------|----------|-------|
-| InternVL3-2B      | 65.10       | 38.49     | 66.10         | 831   | 53.10 | 61.10  | 48.70      | 57.60     | 75.00          | 67.00 | 2186.40  | 64.80 |
-| InternVL3-1B      | 57.00       | 31.14     | 54.94         | 798   | 43.00 | 52.30  | 43.20      | 46.90     | 71.20          | 58.70 | 1912.40  | 49.80 |
-| SmolVLM2-2.2B     | 57.50       | 19.42     | 37.75         | 725   | 42.30 | 46.00  | 41.60      | 51.50     | 71.30          | 34.90 | 1792.50  | -     |
-| LFM2-VL-1.6B      | 65.23       | 37.66     | 58.68         | 742   | 44.40 | 49.53  | 38.44      | 51.10     | 71.97          | 48.07 | 1753.04  | 50.99 |
-| Model             | RealWorldQA | MM-IFEval | InfoVQA (Val) | OCRBench | BLINK | MMStar | MMMU (Val) | MathVista | SEEDBench_IMG | MMVet | MME      | MMLU  |
-|-------------------|-------------|-----------|---------------|----------|-------|--------|------------|-----------|---------------|-------|----------|-------|
-| SmolVLM2-500M     | 49.90       | 11.27     | 24.64         | 609   | 40.70 | 38.20  | 34.10      | 37.50     | 62.20          | 29.90 | 1448.30  | -     |
-| LFM2-VL-450M      | 52.29       | 26.18     | 46.51         | 655   | 41.98 | 40.87  | 33.11      | 44.70     | 63.50          | 33.76 | 1239.06  | 40.16 |
-We obtained MM-IFEval and InfoVQA (Val) scores for InternVL 3 and SmolVLM2 models using VLMEvalKit.
-## 📬 Contact
-If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact).

 license_name: lfm1.0
 license_link: LICENSE
 language:
+- ja
+base_model: LiquidAI/LFM2-VL-1.6B
 tags:
 - liquid
 - lfm2
 - lfm2-vl
+- vision-language
+- japanese
+- multimodal
+- trl
+- sft
+pipeline_tag: image-text-to-text
 ---
+# LFM2-VL-1.6B-jp (Japanese)
+## Model Description
+**LFM2-VL-1.6B-jp** is a Japanese fine-tuned variant of [LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B), optimized for Japanese vision-language tasks. This model maintains the efficiency and performance characteristics of the original LFM2-VL 1.6B architecture while specializing in Japanese language understanding and image description. With 1.6B parameters, this model offers enhanced capabilities compared to the 450M variant while remaining lightweight and suitable for edge deployment.
+- **Developed by:** Alfaxad
+- **Base Model:** [LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B)
+- **Model type:** Vision-Language Model (Multimodal)
+- **Language:** Japanese (日本語)
+- **License:** LFM Open License v1.0
+- **Finetuned from:** LiquidAI/LFM2-VL-1.6B (1.6B parameters)
+## Key Features
+- **Japanese Language Support:** Specialized for Japanese image understanding and description tasks
+- **Enhanced Capabilities:** 1.6B parameters provide improved reasoning and generation quality
+- **Advanced Vision Encoder:** SigLIP2 NaFlex shape-optimized (400M) for better visual understanding
+- **Low Latency:** 2× faster inference speed on GPUs compared to similar-sized VLMs
+- **Multi-turn Conversations:** Trained on conversational data for interactive vision-language tasks
+- **Native Resolution Processing:** Handles images up to 512×512 pixels without upscaling, with intelligent tiling for larger images
+## Model Details
+| Property | Value |
+|---|---:|
+| **Parameters (LM only)** | 1.2B |
+| **Vision encoder** | SigLIP2 NaFlex shape-optimized (400M) |
+| **Total parameters** | ~1.6B |
+| **Backbone layers** | hybrid conv+attention |
+| **Context (text)** | 32,768 tokens |
+| **Image tokens** | dynamic, user-tunable |
+| **Vocab size** | 65,536 |
+| **Precision** | bfloat16 |
+## Training Data
+The model was fine-tuned on approximately **98,000 multi-turn conversational samples** from:
+- **Dataset:** [llm-jp/ja-vg-vqa-conversation](https://huggingface.co/datasets/llm-jp/ja-vg-vqa-conversation)
+- **Content:** Japanese visual question-answering conversations
+- **Format:** Multi-turn dialogues with image context
+## Intended Use
+### Primary Use Cases
+- Japanese image captioning and detailed description
+- Visual question answering in Japanese with enhanced reasoning
+- Multi-turn conversations about images in Japanese
+- Japanese document understanding and OCR tasks
+- Complex visual reasoning tasks in Japanese
+- Edge AI applications requiring Japanese language support
+### Recommended Applications
+- Japanese e-commerce product analysis and description
+- Japanese accessibility tools for visual content
+- Japanese educational applications requiring visual understanding
+- Japanese content moderation and detailed analysis
+- Japanese chatbots with advanced visual understanding
+- Japanese document processing and information extraction
+## How to Use
+### Installation
 ```bash
 pip install -U transformers pillow
 ```
+### Basic Usage
 ```python
 from transformers import AutoProcessor, AutoModelForImageTextToText
 from transformers.image_utils import load_image
 # Load model and processor
+model_id = "Alfaxad/LFM2-VL-1.6B-jp"
 model = AutoModelForImageTextToText.from_pretrained(
     model_id,
     device_map="auto",
 )
 processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
+# Load image and create conversation in Japanese
+image = load_image("your_image_url_or_path.jpg")
 conversation = [
     {
         "role": "user",
         "content": [
             {"type": "image", "image": image},
+            {"type": "text", "text": "この画像について詳しく説明してください。"},
         ],
     },
 ]
+# Generate response
 inputs = processor.apply_chat_template(
     conversation,
     add_generation_prompt=True,
     return_dict=True,
     tokenize=True,
 ).to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=256)
+response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
+print(response)
 ```
+### Multi-turn Conversation Example
+```python
+# Multi-turn conversation
+conversation = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": "この画像には何が写っていますか？"},
+        ],
+    },
+    {
+        "role": "assistant",
+        "content": [
+            {"type": "text", "text": "この画像には赤い車が道路に駐車されています。"},
+        ],
+    },
+    {
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "車のメーカーはわかりますか？"},
+        ],
+    },
+]
+inputs = processor.apply_chat_template(
+    conversation,
+    add_generation_prompt=True,
+    return_tensors="pt",
+    return_dict=True,
+    tokenize=True,
+).to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=128)
+response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
+```
+### Recommended Generation Parameters
+- **Temperature:** 0.1
+- **min_p:** 0.15
+- **repetition_penalty:** 1.05
+- **min_image_tokens:** 64
+- **max_image_tokens:** 256
+- **do_image_splitting:** True
+- **max_new_tokens:** 128-512 (depending on task complexity)
+### Chat Template
+The model uses a ChatML-like format:
+```
+<|startoftext|><|im_start|>system
+あなたはLiquid AIによる有用なマルチモーダルアシスタントです。<|im_end|>
+<|im_start|>user
+<image>この画像を詳しく説明してください。<|im_end|>
+<|im_start|>assistant
+この画像には...<|im_end|>
+```
+## Architecture Highlights
+- **Hybrid backbone:** LFM2-1.2B language model paired with SigLIP2 NaFlex shape-optimized vision encoder (400M)
+- **Native resolution processing:** Handles images up to 512×512 pixels without upscaling
+- **Tiling strategy:** Splits large images into non-overlapping 512×512 patches with thumbnail encoding for global context
+- **Efficient token mapping:** 2-layer MLP connector with pixel unshuffle reduces image tokens efficiently
+- **Inference-time flexibility:** User-tunable maximum image tokens and patch count for speed/quality tradeoff
+## Training Details
+### Training Procedure
+- **Base Model:** LiquidAI/LFM2-VL-1.6B
+- **Fine-tuning Method:** Supervised Fine-Tuning (SFT) with LoRA adapters
+- **Framework:** Hugging Face TRL (Transformer Reinforcement Learning)
+- **Training Data:** ~98,000 multi-turn conversations
+- **Training Regime:** bfloat16 mixed precision
+### Training Hyperparameters
+- **Training approach:** LoRA (Low-Rank Adaptation) fine-tuning
+- **Dataset size:** ~98,000 samples
+- **Data format:** Multi-turn conversational VQA
+- **Language focus:** Japanese
+## Performance Considerations
+As a fine-tuned variant of LFM2-VL-1.6B:
+- **Enhanced Capabilities:** The 1.6B model offers improved reasoning, more detailed descriptions, and better handling of complex visual scenarios compared to the 450M variant
+- **Optimized for Japanese:** Best performance on Japanese language tasks
+- **Resource Efficient:** Still lightweight enough for edge devices while providing enhanced capabilities
+- **Speed vs Quality:** Offers better balance between inference speed and output quality
+- **Recommended Use:** Can be used out-of-the-box for many Japanese VLM tasks, though further fine-tuning on specific use cases will maximize performance
+## Comparison with 450M Variant
+| Aspect | LFM2-VL-450M-jp | LFM2-VL-1.6B-jp |
+|--------|-----------------|-----------------|
+| **Parameters** | 450M total | 1.6B total |
+| **Vision Encoder** | SigLIP2 NaFlex base (86M) | SigLIP2 NaFlex shape-optimized (400M) |
+| **Use Case** | Highly constrained devices | More capable while still lightweight |
+| **Output Quality** | Good for simple tasks | Better for complex reasoning |
+| **Inference Speed** | Faster | Still fast, slightly slower |
+| **Memory Usage** | Lower | Higher but manageable |
+## Limitations
+- **Language Specialization:** Primarily designed for Japanese; performance on other languages may be limited
+- **Domain Specificity:** Performance is optimized for the types of conversations present in the training data
+- **Safety:** Not intended for safety-critical decisions without additional validation
+- **Complex Reasoning:** While improved over 450M, may still struggle with highly complex multi-step reasoning compared to much larger models
+- **Cultural Context:** Trained on Japanese data; cultural nuances should be considered
+## Citation
+If you use this model, please cite both the original LFM2-VL model and this fine-tuned variant:
+```bibtex
+@misc{lfm2-vl-1.6b-jp,
+  author = {Alfaxad},
+  title = {LFM2-VL-1.6B-jp: Japanese Fine-tuned Vision-Language Model},
+  year = {2025},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/Alfaxad/LFM2-VL-1.6B-jp}
+}
+@misc{liquid-lfm2-vl,
+  author = {Liquid AI},
+  title = {LFM2-VL: Efficient Vision-Language Models},
+  year = {2025},
+  url = {https://huggingface.co/LiquidAI/LFM2-VL-1.6B}
+}
+```
+## Acknowledgments
+- **Base Model:** [Liquid AI](https://www.liquid.ai/) for the LFM2-VL architecture
+- **Training Data:** [llm-jp](https://huggingface.co/llm-jp) for the ja-vg-vqa-conversation dataset
+- **Framework:** Hugging Face for transformers and TRL libraries
+## Contact
+For questions or issues regarding this model, please open an issue on the model's Hugging Face page or contact the model developer.
+## Additional Resources
+- **Original Model:** [LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B)
+- **Smaller Variant:** [Alfaxad/LFM2-VL-450M-jp](https://huggingface.co/Alfaxad/LFM2-VL-450M-jp)
+- **Training Dataset:** [llm-jp/ja-vg-vqa-conversation](https://huggingface.co/datasets/llm-jp/ja-vg-vqa-conversation)
+- **LFM2-VL Blog Post:** [Liquid AI Blog](https://www.liquid.ai/blog/lfm2-vl-efficient-vision-language-models)
+- **Original Paper/Documentation:** [LFM2 Blog Post](https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models)