Envision Eye Imaging Classifier

SetFit few-shot classifier for identifying eye imaging datasets from scientific metadata.

Developed by: FAIR Data Innovations Hub in collaboration with the EyeACT Study

Model Description

Uses Alibaba-NLP/gte-large-en-v1.5 as backbone with 4-class classification:

  • EYE_IMAGING (3): Actual ophthalmic imaging datasets (fundus, OCT, OCTA, cornea)
  • EYE_SOFTWARE (2): Code, tools, models for eye imaging
  • EDGE_CASE (1): Eye research papers, reviews, non-imaging data
  • NEGATIVE (0): Not eye-related

Results on Zenodo

Tested on 515 Zenodo datasets (filtered to resource_type=dataset only):

Class Count
EYE_IMAGING 120
EYE_SOFTWARE 66
EDGE_CASE 3
NEGATIVE 325

Confidence Distribution (EYE_IMAGING)

Confidence Count %
High (≥0.95) 117 97.5%
Medium (0.80-0.95) 2 1.7%
Lower (<0.80) 1 0.8%

Data Pipeline

  • Scraped with datasets-only filter
  • ZIP contents inspected via HTTP Range requests (31,958 files catalogued)
  • Genomics files excluded (.fasta, .h5ad, .vcf, etc.)

Training

  • Examples: 452 (99 positive, 30 software, 90 edge case, 233 negative)
  • Epochs: 2
  • Batch Size: 16

Usage

from sentence_transformers import SentenceTransformer
import joblib

model = SentenceTransformer("jimnoneill/envision-eye-imaging-classifier", trust_remote_code=True)
head = joblib.load("model_head.pkl")

embeddings = model.encode(["Retinal OCT dataset for diabetic retinopathy"])
predictions = head.predict(embeddings)

Citation

  • EyeACT Envision project
  • FAIR Data Innovations Hub (fairdataihub.org)
  • Alibaba-NLP/gte-large-en-v1.5

Contact

EyeACT team: eyeactstudy.org

Downloads last month
21
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support