IRIS-4B

IRIS-4B is a Qwen3-VL-Thinking model fine-tuned for external-eye ophthalmology visual question answering in the IRIS project.

The model is intended for non-commercial research use on ocular surface disease analysis from external eye photographs. It was trained with IRIS-120K data using the Topic Finding Tree and scene-driven VQA generation framework.

Files

This release directory contains only the merged model weights and inference-time tokenizer / processor configuration files. Training logs, local training arguments, deployment results, and server-specific files are intentionally excluded.

Usage

The commands below were validated against an environment that supports Qwen3-VL:

  • Python 3.11
  • transformers==4.57.3
  • vllm==0.11.0
  • ms-swift==3.12.0

Serve with vLLM

conda activate vllm

MODEL_ID="<your-org-or-name>/IRIS-4B"
export CUDA_VISIBLE_DEVICES=0,1,2,3
export IMAGE_MAX_TOKEN_NUM=1024

vllm serve "${MODEL_ID}" \
  --served-model-name IRIS-4B \
  --trust-remote-code \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.90 \
  --max-model-len 8192 \
  --limit-mm-per-prompt '{"image": 1, "video": 0}' \
  --port 8000

Serve with MS-Swift

conda activate vllm

MODEL_ID="<your-org-or-name>/IRIS-4B"
export CUDA_VISIBLE_DEVICES=0,1,2,3
export NPROC_PER_NODE=4
export IMAGE_MAX_TOKEN_NUM=1024

swift deploy \
  --model "${MODEL_ID}" \
  --infer_backend vllm \
  --vllm_gpu_memory_utilization 0.90 \
  --vllm_max_model_len 8192 \
  --max_new_tokens 2048 \
  --port 8000 \
  --vllm_tensor_parallel_size 4 \
  --system "You are an expert ophthalmologist analyzing external eye photos. Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, strictly adhering to the structure: Visual Observation, Clinical Correlation, Logical Deduction, and Conclusion. After this internal analysis, provide your final professional answer." \
  --vllm_limit_mm_per_prompt '{"image": 1, "video": 0}' \
  --served_model_name IRIS-4B

System Prompt

IRIS was trained with two system-prompt styles. Choose the prompt according to the target use case. The default prompt used by the release scripts is the tree-based prompt.

Tree-based / TFT prompt:

You are an expert ophthalmologist analyzing external eye photos. Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, strictly adhering to the structure: Visual Observation, Clinical Correlation, Logical Deduction, and Conclusion. After this internal analysis, provide your final professional answer.

Scene-driven prompt:

You are an ophthalmology assistant customizing your response for a {role} based on an external eye photo. Enclose your step-by-step reasoning in <think> tags, strictly adhering to the structure: Analyze Query, Evidence Extraction, Medical Logic, and Formulation (adapting tone and complexity). After this internal analysis, provide your final response optimized for the {role}.

For scene-driven use, replace {role} with the intended user role, such as patient, doctor, or medical student. When serving with plain vLLM, the server does not inject this prompt automatically; include it as the first system message in each OpenAI-compatible request. When serving with MS-Swift, you can pass the default tree-based prompt through --system, as shown above, or replace it with the scene-driven prompt.

OpenAI-Compatible Request

After starting either server, call it through the OpenAI-compatible API. Replace the base64 data URL with your own external-eye image and choose the system prompt for your task.

import base64
from pathlib import Path
from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")

image_path = Path("/path/to/external_eye.jpg")
image_url = "data:image/jpeg;base64," + base64.b64encode(image_path.read_bytes()).decode("utf-8")
system_prompt = (
    "You are an expert ophthalmologist analyzing external eye photos. "
    "Perform a rigorous analysis by enclosing your step-by-step reasoning in <think> tags, "
    "strictly adhering to the structure: Visual Observation, Clinical Correlation, "
    "Logical Deduction, and Conclusion. After this internal analysis, provide your final "
    "professional answer."
)

response = client.chat.completions.create(
    model="IRIS-4B",
    messages=[
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What ocular surface abnormality is visible in this image?"},
                {"type": "image_url", "image_url": {"url": image_url}},
            ],
        }
    ],
    max_tokens=2048,
)

print(response.choices[0].message.content)

License and Use Restrictions

IRIS-4B model weights are released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). Commercial use is not permitted.

IRIS-4B is fine-tuned from Qwen3-VL-4B-Thinking, which is released under the Apache-2.0 license. Please also comply with the original Qwen license and attribution requirements.

This model is provided for research purposes only. It is not approved for clinical use and must not be used for clinical diagnosis, treatment decisions, patient management, or any other medical practice. Model outputs are for reference only and may be incomplete or incorrect. They must not replace professional clinical judgment or consultation with qualified healthcare professionals.

Downloads last month
26
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support