Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Paper โข 2301.08243 โข Published โข 7
Self-supervised I-JEPA (Joint-Embedding Predictive Architecture) encoders pretrained on Harvard FairVision OCT data for binary glaucoma classification.
| File | Description | Size | Downstream AUC |
|---|---|---|---|
jepa_patch-imagenet-init-ep32-best.pth.tar |
Best encoder. ViT-B/16, ImageNet-init โ I-JEPA on 600K OCT slices, epoch 32. | 1.5 GB | 0.829 (fine-tuned) |
jepa_patch-run3-ep11.pth.tar |
ViT-B/16, random-init โ I-JEPA on 600K OCT slices, epoch 11. | 1.5 GB | 0.819 val (fine-tuned) |
vit_b16_imagenet_timm.pth |
ViT-B/16 ImageNet supervised weights (timm). Used as initialization for I-JEPA pretraining. | 327 MB | N/A (base init) |
import torch
from huggingface_hub import hf_hub_download
# Download best encoder
path = hf_hub_download("yfeng0206/ijepa-oct-glaucoma", "jepa_patch-imagenet-init-ep32-best.pth.tar")
ckpt = torch.load(path, map_location="cpu")
encoder_weights = ckpt["target_encoder"]
Harvard FairVision Glaucoma: 10,000 subjects (6K train / 1K val / 3K test), 200x200x200 OCT volumes.