I-JEPA Pretrained Encoders for OCT Glaucoma Classification

Self-supervised I-JEPA (Joint-Embedding Predictive Architecture) encoders pretrained on Harvard FairVision OCT data for binary glaucoma classification.

Available Weights

File Description Size Downstream AUC
jepa_patch-imagenet-init-ep32-best.pth.tar Best encoder. ViT-B/16, ImageNet-init โ†’ I-JEPA on 600K OCT slices, epoch 32. 1.5 GB 0.829 (fine-tuned)
jepa_patch-run3-ep11.pth.tar ViT-B/16, random-init โ†’ I-JEPA on 600K OCT slices, epoch 11. 1.5 GB 0.819 val (fine-tuned)
vit_b16_imagenet_timm.pth ViT-B/16 ImageNet supervised weights (timm). Used as initialization for I-JEPA pretraining. 327 MB N/A (base init)

Usage

import torch
from huggingface_hub import hf_hub_download

# Download best encoder
path = hf_hub_download("yfeng0206/ijepa-oct-glaucoma", "jepa_patch-imagenet-init-ep32-best.pth.tar")
ckpt = torch.load(path, map_location="cpu")
encoder_weights = ckpt["target_encoder"]

Architecture

  • Encoder: ViT-B/16 (86M params, 12 layers, 768-d, 12 heads, patch size 16)
  • Pretraining: I-JEPA masked representation prediction on 600K OCT B-scan slices (6K volumes x 100 slices)
  • Downstream: Frozen or fine-tuned encoder โ†’ AttentiveProbe (3 transformer blocks for slice-level attention) โ†’ MLP head โ†’ binary glaucoma classification

Dataset

Harvard FairVision Glaucoma: 10,000 subjects (6K train / 1K val / 3K test), 200x200x200 OCT volumes.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for yfeng0206/ijepa-oct-glaucoma