Instructions to use hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning", dtype="auto")

PEFT
How to use hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning

SGLang

How to use hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning with Docker Model Runner:
```
docker model run hf.co/hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning
```

LoRA Adapter - Vietnamese Textbook (SGK) Narration Captioning

This repository provides LoRA adapter weights only finetuned from Qwen/Qwen2-VL-2B-Instruct for Vietnamese textbook image captioning in a narration-style format that is screen-reader friendly aiming to support visually impaired learners.

This repo contains only the adapter (not the base model).
Full documentation + end-to-end pipeline (01→05) are on GitHub:
https://github.com/itshoang2024/vi-textbook-caption-qwen2vl

adapter_model.safetensors, adapter_config.json: LoRA adapter weights (PEFT)
run_config.json: prompt + generation config used in our pipeline (reproducibility)

Benchmark (test split)

Dataset: bbdontcry/vietnamese-image-captioning (test)

Model	Quote-CER ↓	Concept-Rec ↑	LLM-Score ↑	BERTScore ↑	BLEU-4 ↑	METEOR ↑
Qwen2-VL-2B (Zero-shot)	0.995	0.222	3.36/10	0.671	6.30	0.142
Qwen2-VL-2B + LoRA (RUN_ID=A)	0.385	0.632	5.09/10	0.837	43.73	0.521

Quickstart (load adapter)

pip install -U transformers accelerate peft safetensors pillow qwen-vl-utils

import torch
from peft import PeftModel
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor

BASE_ID = "Qwen/Qwen2-VL-2B-Instruct"
ADAPTER_ID = "hoangphann/LoRA-Qwen2-VL-2B-Instruct-captioning"

model = Qwen2VLForConditionalGeneration.from_pretrained(
    BASE_ID,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto" if torch.cuda.is_available() else None,
)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

processor = AutoProcessor.from_pretrained(BASE_ID)
print("Loaded LoRA adapter (RUN_ID=A).")

For the exact prompt/generation setup used in evaluation, see run_config.json and the Kaggle/GitHub pipeline.

Example (Textbook page → narration-style caption)

SGK_CanhDieu_TuNhienVaXaHoi_3_page_063.png

Model output

Phía trên cùng là câu hỏi bài tập số 2: "2. Cây nào có rễ cọc, cây nào có rễ chìm trong các hình dưới đây?". Bên dưới là 4 hình minh họa được đánh số thứ tự: - Hình 1 ở góc trên bên trái: Vẽ một cây xoài. Cây có thân và lá, thân cây có rễ. Bên dưới hình là nhãn: "1 Cây xoài". - Hình 2 ở góc trên bên phải: Vẽ một cây lúa. Cây có thân và lá, thân cây có rễ. Bên dưới hình là nhãn: "2 Cây lúa". - Hình 3 ở góc dưới bên trái: Vẽ một cây ngô. Cây có thân và lá, thân cây có rễ. Bên dưới hình là nhãn: "3 Cây ngô". - Hình 4 ở góc dưới bên phải: Vẽ một cây cam. Cây có thân và lá, thân cây có rễ. Bên dưới hình là nhãn: "4 Cây cam". Phía dưới các hình minh họa là biểu tượng dấu hỏi chấm màu đỏ cam kèm câu hỏi: "Kể tên một số cây khác có rễ cọc, rễ chìm mà em biết.". Góc dưới cùng bên trái trang sách có số trang "62". Dưới cùng là dòng chữ mờ "Đọc sách tại hoc10.vn".

Kaggle notebooks (final pipeline 01→05)

#	Notebook	Output
1	Finetune	`vn-textbook-qwen2vl-01-adapters`
2	Inference	`vn-textbook-qwen2vl-02-predictions`
3a	Metrics (Light)	`vn-textbook-qwen2vl-03-metrics`
3b	Metrics (Heavy)	`vn-textbook-qwen2vl-03-metrics`
4	Merge results	`vn-textbook-qwen2vl-04-results`
5	Demo (Gradio)	-