I-JEPA Pretrained Encoders for OCT Glaucoma Classification

Self-supervised I-JEPA (Joint-Embedding Predictive Architecture) encoders pretrained on Harvard FairVision OCT data for binary glaucoma classification.

Available Weights

File	Description	Size	Downstream AUC
`jepa_patch-imagenet-init-ep32-best.pth.tar`	Best encoder. ViT-B/16, ImageNet-init → I-JEPA on 600K OCT slices, epoch 32.	1.5 GB	0.829 (fine-tuned)
`jepa_patch-run3-ep11.pth.tar`	ViT-B/16, random-init → I-JEPA on 600K OCT slices, epoch 11.	1.5 GB	0.819 val (fine-tuned)
`vit_b16_imagenet_timm.pth`	ViT-B/16 ImageNet supervised weights (timm). Used as initialization for I-JEPA pretraining.	327 MB	N/A (base init)

Usage

import torch
from huggingface_hub import hf_hub_download

# Download best encoder
path = hf_hub_download("yfeng0206/ijepa-oct-glaucoma", "jepa_patch-imagenet-init-ep32-best.pth.tar")
ckpt = torch.load(path, map_location="cpu")
encoder_weights = ckpt["target_encoder"]

Architecture

Encoder: ViT-B/16 (86M params, 12 layers, 768-d, 12 heads, patch size 16)
Pretraining: I-JEPA masked representation prediction on 600K OCT B-scan slices (6K volumes x 100 slices)
Downstream: Frozen or fine-tuned encoder → AttentiveProbe (3 transformer blocks for slice-level attention) → MLP head → binary glaucoma classification

Dataset

Harvard FairVision Glaucoma: 10,000 subjects (6K train / 1K val / 3K test), 200x200x200 OCT volumes.

Paper for yfeng0206/ijepa-oct-glaucoma

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Paper • 2301.08243 • Published Jan 19, 2023 • 7

yfeng0206
/

ijepa-oct-glaucoma