Instructions to use hw-hwei/IRIS-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hw-hwei/IRIS-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="hw-hwei/IRIS-4B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("hw-hwei/IRIS-4B") model = AutoModelForMultimodalLM.from_pretrained("hw-hwei/IRIS-4B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use hw-hwei/IRIS-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hw-hwei/IRIS-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hw-hwei/IRIS-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/hw-hwei/IRIS-4B
- SGLang
How to use hw-hwei/IRIS-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "hw-hwei/IRIS-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hw-hwei/IRIS-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "hw-hwei/IRIS-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hw-hwei/IRIS-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use hw-hwei/IRIS-4B with Docker Model Runner:
docker model run hf.co/hw-hwei/IRIS-4B
IRIS-4B
IRIS-4B is a Qwen3-VL-Thinking model fine-tuned for external-eye ophthalmology visual question answering in the IRIS project.
The model is intended for non-commercial research use on ocular surface disease analysis from external eye photographs. It was trained with IRIS-120K data using the Topic Finding Tree and scene-driven VQA generation framework.
Files
This release directory contains only the merged model weights and inference-time tokenizer / processor configuration files. Training logs, local training arguments, deployment results, and server-specific files are intentionally excluded.
Usage
The commands below were validated against an environment that supports Qwen3-VL:
- Python 3.11
transformers==4.57.3vllm==0.11.0ms-swift==3.12.0
Serve with vLLM
conda activate vllm
MODEL_ID="<your-org-or-name>/IRIS-4B"
export CUDA_VISIBLE_DEVICES=0,1,2,3
export IMAGE_MAX_TOKEN_NUM=1024
vllm serve "${MODEL_ID}" \
--served-model-name IRIS-4B \
--trust-remote-code \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.90 \
--max-model-len 8192 \
--limit-mm-per-prompt '{"image": 1, "video": 0}' \
--port 8000
Serve with MS-Swift
conda activate vllm
MODEL_ID="<your-org-or-name>/IRIS-4B"
export CUDA_VISIBLE_DEVICES=0,1,2,3
export NPROC_PER_NODE=4
export IMAGE_MAX_TOKEN_NUM=1024
swift deploy \
--model "${MODEL_ID}" \
--infer_backend vllm \
--vllm_gpu_memory_utilization 0.90 \
--vllm_max_model_len 8192 \
--max_new_tokens 2048 \
--port 8000 \
--vllm_tensor_parallel_size 4 \
--system "You are an expert ophthalmologist analyzing external eye photos. Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, strictly adhering to the structure: Visual Observation, Clinical Correlation, Logical Deduction, and Conclusion. After this internal analysis, provide your final professional answer." \
--vllm_limit_mm_per_prompt '{"image": 1, "video": 0}' \
--served_model_name IRIS-4B
System Prompt
IRIS was trained with two system-prompt styles. Choose the prompt according to the target use case. The default prompt used by the release scripts is the tree-based prompt.
Tree-based / TFT prompt:
You are an expert ophthalmologist analyzing external eye photos. Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, strictly adhering to the structure: Visual Observation, Clinical Correlation, Logical Deduction, and Conclusion. After this internal analysis, provide your final professional answer.
Scene-driven prompt:
You are an ophthalmology assistant customizing your response for a {role} based on an external eye photo. Enclose your step-by-step reasoning in <think> tags, strictly adhering to the structure: Analyze Query, Evidence Extraction, Medical Logic, and Formulation (adapting tone and complexity). After this internal analysis, provide your final response optimized for the {role}.
For scene-driven use, replace {role} with the intended user role, such as patient, doctor, or medical student. When serving with plain vLLM, the server does not inject this prompt automatically; include it as the first system message in each OpenAI-compatible request. When serving with MS-Swift, you can pass the default tree-based prompt through --system, as shown above, or replace it with the scene-driven prompt.
OpenAI-Compatible Request
After starting either server, call it through the OpenAI-compatible API. Replace the base64 data URL with your own external-eye image and choose the system prompt for your task.
import base64
from pathlib import Path
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
image_path = Path("/path/to/external_eye.jpg")
image_url = "data:image/jpeg;base64," + base64.b64encode(image_path.read_bytes()).decode("utf-8")
system_prompt = (
"You are an expert ophthalmologist analyzing external eye photos. "
"Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, "
"strictly adhering to the structure: Visual Observation, Clinical Correlation, "
"Logical Deduction, and Conclusion. After this internal analysis, provide your final "
"professional answer."
)
response = client.chat.completions.create(
model="IRIS-4B",
messages=[
{"role": "system", "content": system_prompt},
{
"role": "user",
"content": [
{"type": "text", "text": "What ocular surface abnormality is visible in this image?"},
{"type": "image_url", "image_url": {"url": image_url}},
],
}
],
max_tokens=2048,
)
print(response.choices[0].message.content)
License and Use Restrictions
IRIS-4B model weights are released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). Commercial use is not permitted.
IRIS-4B is fine-tuned from Qwen3-VL-4B-Thinking, which is released under the Apache-2.0 license. Please also comply with the original Qwen license and attribution requirements.
This model is provided for research purposes only. It is not approved for clinical use and must not be used for clinical diagnosis, treatment decisions, patient management, or any other medical practice. Model outputs are for reference only and may be incomplete or incorrect. They must not replace professional clinical judgment or consultation with qualified healthcare professionals.
- Downloads last month
- 26