OPENEye-FM

OPENEye-FM is a retinal vision-language foundation model for colour fundus photography. It was trained on OPENEye-Data, a curated collection of 233,651 fundus photographs from 37 open-source datasets with harmonized disease and clinical sign labels.

OPENEye-FM is designed to learn from heterogeneous public fundus datasets by mapping disease labels, severity labels, and clinical sign labels into a shared disease-sign label space. Textual disease and sign descriptions are used as semantic anchors during training.


Model Details

Item Description
Model OPENEye-FM
Model type Retinal vision-language foundation model
Image encoder DINOv2-style Vision Transformer
Language encoder Sentence-BERT
Input modality Colour fundus photography
Input size 224 × 224
Label space 36 harmonized disease/sign labels

Intended Use

OPENEye-FM is intended for research use in ophthalmic AI, including:

  • retinal disease classification
  • diabetic retinopathy grading
  • retinal sign detection
  • zero-shot inference with disease/sign text prompts

OPENEye-FM is not intended for direct clinical diagnosis, autonomous screening, treatment planning, referral decisions, or treatment decision-making.


Training Data

OPENEye-FM was trained on OPENEye-Data, which contains 233,651 colour fundus photographs from 37 open-source datasets.

The original labels from these datasets were harmonized into a unified ophthalmic disease/sign label space. Unknown or unavailable labels are masked during training rather than treated as negative labels.

The OPENEye-Data manifest and label prompts are released with the official GitHub repository. Users should refer to the repository for dataset reconstruction and usage instructions.


Quick Start

Please refer to the official GitHub repository for installation, full inference scripts, fine-tuning, linear probing, few-shot evaluation, and zero-shot evaluation.

import torch
from models import openeye
from inference import load_openeye_checkpoint

model = openeye.vit_base_dinov2(
    img_size=224,
    num_classes=36,
    drop_path_rate=0.0,
    global_pool="avg",
    dynamic_img_size=True,
)

load_openeye_checkpoint(model, "weights/OpenEye.pth")
model.eval()

x = torch.randn(1, 3, 224, 224)

with torch.no_grad():
    logits = model(x)

print(logits.shape)  # [1, 36]

Single-image inference:

python inference.py \
  --image examples/fundus.jpg \
  --checkpoint weights/OpenEye.pth

Citation

If you use OPENEye-FM, OPENEye-Data, or the official code, please cite:

@article{openeye_fm,
  title   = {A low-cost and efficient vision-language foundation model with unified disease-sign training for cross-modality assessment and primary eye care},
  author  = {Zhou, Yang and Soh, Zhi Da and Yu, Kai and Thakur, Sahil and Bai, Yang and Wei, Hongyu and Lee, Zann and Peng, Qingsheng and Xue, Can Can and Zhou, Jun and Lei, Xiaofeng and Feng, Yanqin and Goh, Rick Siow Mong and Liu, Yong and Cheng, Ching-Yu},
  journal = {Under Review},
  year    = {2026}
}

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for youngzhou12/OPENEye-FM

Paper for youngzhou12/OPENEye-FM