OPENEye-FM
OPENEye-FM is a retinal vision-language foundation model for colour fundus photography. It was trained on OPENEye-Data, a curated collection of 233,651 fundus photographs from 37 open-source datasets with harmonized disease and clinical sign labels.
OPENEye-FM is designed to learn from heterogeneous public fundus datasets by mapping disease labels, severity labels, and clinical sign labels into a shared disease-sign label space. Textual disease and sign descriptions are used as semantic anchors during training.
- Paper: A low-cost and efficient vision-language foundation model with unified disease-sign training for cross-modality assessment and primary eye care
- Code: https://github.com/yangzhou12/OPENEye-FM/
Model Details
| Item | Description |
|---|---|
| Model | OPENEye-FM |
| Model type | Retinal vision-language foundation model |
| Image encoder | DINOv2-style Vision Transformer |
| Language encoder | Sentence-BERT |
| Input modality | Colour fundus photography |
| Input size | 224 × 224 |
| Label space | 36 harmonized disease/sign labels |
Intended Use
OPENEye-FM is intended for research use in ophthalmic AI, including:
- retinal disease classification
- diabetic retinopathy grading
- retinal sign detection
- zero-shot inference with disease/sign text prompts
OPENEye-FM is not intended for direct clinical diagnosis, autonomous screening, treatment planning, referral decisions, or treatment decision-making.
Training Data
OPENEye-FM was trained on OPENEye-Data, which contains 233,651 colour fundus photographs from 37 open-source datasets.
The original labels from these datasets were harmonized into a unified ophthalmic disease/sign label space. Unknown or unavailable labels are masked during training rather than treated as negative labels.
The OPENEye-Data manifest and label prompts are released with the official GitHub repository. Users should refer to the repository for dataset reconstruction and usage instructions.
Quick Start
Please refer to the official GitHub repository for installation, full inference scripts, fine-tuning, linear probing, few-shot evaluation, and zero-shot evaluation.
import torch
from models import openeye
from inference import load_openeye_checkpoint
model = openeye.vit_base_dinov2(
img_size=224,
num_classes=36,
drop_path_rate=0.0,
global_pool="avg",
dynamic_img_size=True,
)
load_openeye_checkpoint(model, "weights/OpenEye.pth")
model.eval()
x = torch.randn(1, 3, 224, 224)
with torch.no_grad():
logits = model(x)
print(logits.shape) # [1, 36]
Single-image inference:
python inference.py \
--image examples/fundus.jpg \
--checkpoint weights/OpenEye.pth
Citation
If you use OPENEye-FM, OPENEye-Data, or the official code, please cite:
@article{openeye_fm,
title = {A low-cost and efficient vision-language foundation model with unified disease-sign training for cross-modality assessment and primary eye care},
author = {Zhou, Yang and Soh, Zhi Da and Yu, Kai and Thakur, Sahil and Bai, Yang and Wei, Hongyu and Lee, Zann and Peng, Qingsheng and Xue, Can Can and Zhou, Jun and Lei, Xiaofeng and Feng, Yanqin and Goh, Rick Siow Mong and Liu, Yong and Cheng, Ching-Yu},
journal = {Under Review},
year = {2026}
}
Model tree for youngzhou12/OPENEye-FM
Base model
nreimers/MiniLM-L6-H384-uncased