OPENEye-FM

OPENEye-FM is a retinal vision-language foundation model for colour fundus photography. It was trained on OPENEye-Data, a curated collection of 233,651 fundus photographs from 37 open-source datasets with harmonized disease and clinical sign labels.

OPENEye-FM is designed to learn from heterogeneous public fundus datasets by mapping disease labels, severity labels, and clinical sign labels into a shared disease-sign label space. Textual disease and sign descriptions are used as semantic anchors during training.

Model Details

Item	Description
Model	OPENEye-FM
Model type	Retinal vision-language foundation model
Image encoder	DINOv2-style Vision Transformer
Language encoder	Sentence-BERT
Input modality	Colour fundus photography
Input size	224 × 224
Label space	36 harmonized disease/sign labels

Intended Use

OPENEye-FM is intended for research use in ophthalmic AI, including:

retinal disease classification
diabetic retinopathy grading
retinal sign detection
zero-shot inference with disease/sign text prompts

OPENEye-FM is not intended for direct clinical diagnosis, autonomous screening, treatment planning, referral decisions, or treatment decision-making.

Training Data

OPENEye-FM was trained on OPENEye-Data, which contains 233,651 colour fundus photographs from 37 open-source datasets.

The original labels from these datasets were harmonized into a unified ophthalmic disease/sign label space. Unknown or unavailable labels are masked during training rather than treated as negative labels.

The OPENEye-Data manifest and label prompts are released with the official GitHub repository. Users should refer to the repository for dataset reconstruction and usage instructions.

Quick Start

Please refer to the official GitHub repository for installation, full inference scripts, fine-tuning, linear probing, few-shot evaluation, and zero-shot evaluation.

import torch
from models import openeye
from inference import load_openeye_checkpoint

model = openeye.vit_base_dinov2(
    img_size=224,
    num_classes=36,
    drop_path_rate=0.0,
    global_pool="avg",
    dynamic_img_size=True,
)

load_openeye_checkpoint(model, "weights/OpenEye.pth")
model.eval()

x = torch.randn(1, 3, 224, 224)

with torch.no_grad():
    logits = model(x)

print(logits.shape)  # [1, 36]

Single-image inference:

python inference.py \
  --image examples/fundus.jpg \
  --checkpoint weights/OpenEye.pth

Citation

If you use OPENEye-FM, OPENEye-Data, or the official code, please cite:

@article{openeye_fm,
  title   = {A low-cost and efficient vision-language foundation model with unified disease-sign training for cross-modality assessment and primary eye care},
  author  = {Zhou, Yang and Soh, Zhi Da and Yu, Kai and Thakur, Sahil and Bai, Yang and Wei, Hongyu and Lee, Zann and Peng, Qingsheng and Xue, Can Can and Zhou, Jun and Lei, Xiaofeng and Feng, Yanqin and Goh, Rick Siow Mong and Liu, Yong and Cheng, Ching-Yu},
  journal = {Under Review},
  year    = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for youngzhou12/OPENEye-FM

Base model

nreimers/MiniLM-L6-H384-uncased

Quantized

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(940)

this model

Paper for youngzhou12/OPENEye-FM

Contrastive Learning of Medical Visual Representations from Paired Images and Text

Paper • 2010.00747 • Published Oct 2, 2020