Instructions to use hw-hwei/IRIS-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use hw-hwei/IRIS-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="hw-hwei/IRIS-4B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("hw-hwei/IRIS-4B")
model = AutoModelForMultimodalLM.from_pretrained("hw-hwei/IRIS-4B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use hw-hwei/IRIS-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hw-hwei/IRIS-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hw-hwei/IRIS-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/hw-hwei/IRIS-4B

SGLang

How to use hw-hwei/IRIS-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "hw-hwei/IRIS-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hw-hwei/IRIS-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "hw-hwei/IRIS-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hw-hwei/IRIS-4B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use hw-hwei/IRIS-4B with Docker Model Runner:
```
docker model run hf.co/hw-hwei/IRIS-4B
```

IRIS-4B

IRIS-4B is a Qwen3-VL-Thinking model fine-tuned for external-eye ophthalmology visual question answering in the IRIS project.

The model is intended for non-commercial research use on ocular surface disease analysis from external eye photographs. It was trained with IRIS-120K data using the Topic Finding Tree and scene-driven VQA generation framework.

Files

This release directory contains only the merged model weights and inference-time tokenizer / processor configuration files. Training logs, local training arguments, deployment results, and server-specific files are intentionally excluded.

Usage

The commands below were validated against an environment that supports Qwen3-VL:

Python 3.11
transformers==4.57.3
vllm==0.11.0
ms-swift==3.12.0

Serve with vLLM

conda activate vllm

MODEL_ID="<your-org-or-name>/IRIS-4B"
export CUDA_VISIBLE_DEVICES=0,1,2,3
export IMAGE_MAX_TOKEN_NUM=1024

vllm serve "${MODEL_ID}" \
  --served-model-name IRIS-4B \
  --trust-remote-code \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.90 \
  --max-model-len 8192 \
  --limit-mm-per-prompt '{"image": 1, "video": 0}' \
  --port 8000

Serve with MS-Swift

conda activate vllm

MODEL_ID="<your-org-or-name>/IRIS-4B"
export CUDA_VISIBLE_DEVICES=0,1,2,3
export NPROC_PER_NODE=4
export IMAGE_MAX_TOKEN_NUM=1024

swift deploy \
  --model "${MODEL_ID}" \
  --infer_backend vllm \
  --vllm_gpu_memory_utilization 0.90 \
  --vllm_max_model_len 8192 \
  --max_new_tokens 2048 \
  --port 8000 \
  --vllm_tensor_parallel_size 4 \
  --system "You are an expert ophthalmologist analyzing external eye photos. Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, strictly adhering to the structure: Visual Observation, Clinical Correlation, Logical Deduction, and Conclusion. After this internal analysis, provide your final professional answer." \
  --vllm_limit_mm_per_prompt '{"image": 1, "video": 0}' \
  --served_model_name IRIS-4B

System Prompt

IRIS was trained with two system-prompt styles. Choose the prompt according to the target use case. The default prompt used by the release scripts is the tree-based prompt.

Tree-based / TFT prompt:

You are an expert ophthalmologist analyzing external eye photos. Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, strictly adhering to the structure: Visual Observation, Clinical Correlation, Logical Deduction, and Conclusion. After this internal analysis, provide your final professional answer.

Scene-driven prompt:

You are an ophthalmology assistant customizing your response for a {role} based on an external eye photo. Enclose your step-by-step reasoning in <think> tags, strictly adhering to the structure: Analyze Query, Evidence Extraction, Medical Logic, and Formulation (adapting tone and complexity). After this internal analysis, provide your final response optimized for the {role}.

For scene-driven use, replace {role} with the intended user role, such as patient, doctor, or medical student. When serving with plain vLLM, the server does not inject this prompt automatically; include it as the first system message in each OpenAI-compatible request. When serving with MS-Swift, you can pass the default tree-based prompt through --system, as shown above, or replace it with the scene-driven prompt.

OpenAI-Compatible Request

After starting either server, call it through the OpenAI-compatible API. Replace the base64 data URL with your own external-eye image and choose the system prompt for your task.

import base64
from pathlib import Path
from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")

image_path = Path("/path/to/external_eye.jpg")
image_url = "data:image/jpeg;base64," + base64.b64encode(image_path.read_bytes()).decode("utf-8")
system_prompt = (
    "You are an expert ophthalmologist analyzing external eye photos. "
    "Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, "
    "strictly adhering to the structure: Visual Observation, Clinical Correlation, "
    "Logical Deduction, and Conclusion. After this internal analysis, provide your final "
    "professional answer."
)

response = client.chat.completions.create(
    model="IRIS-4B",
    messages=[
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What ocular surface abnormality is visible in this image?"},
                {"type": "image_url", "image_url": {"url": image_url}},
            ],
        }
    ],
    max_tokens=2048,
)

print(response.choices[0].message.content)

License and Use Restrictions

IRIS-4B model weights are released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). Commercial use is not permitted.

IRIS-4B is fine-tuned from Qwen3-VL-4B-Thinking, which is released under the Apache-2.0 license. Please also comply with the original Qwen license and attribution requirements.

This model is provided for research purposes only. It is not approved for clinical use and must not be used for clinical diagnosis, treatment decisions, patient management, or any other medical practice. Model outputs are for reference only and may be incomplete or incorrect. They must not replace professional clinical judgment or consultation with qualified healthcare professionals.

Downloads last month: 26

Safetensors

Model size

4B params

Tensor type

BF16