Resources
- 📄 Paper: https://arxiv.org/abs/2601.15891
- 🤗 Hugging Face: https://huggingface.co/AIDELab-IITBombay/RadJEPA
- 💻 Code (GitHub): https://github.com/aidelab-iitbombay/RadJEPA
RadJEPA
RadJEPA is a self-supervised vision encoder for chest X-ray images based on a Joint Embedding Predictive Architecture (JEPA).
The model learns visual representations by predicting latent features of masked image regions, without text supervision or pixel-level reconstruction.
RadJEPA is intended as a general-purpose radiology image backbone for downstream tasks.
Overview
- Model type: Vision Transformer–based JEPA encoder
- Training: Self-supervised latent prediction
- Input: Chest X-ray images
- Finetuned from model:
ijepa
Intended use
The model is a vision backbone that can be plugged to other models for downstream tasks. Typical downstream applications include:
- Multi-label classification
- Semantic segmentation using patch embeddings
- Image retrieval and clustering
- Report generation, with a language model to decode text
Load RadJEPA
from transformers import AutoModel
model = AutoModel.from_pretrained(
"AIDElab-IITBombay/RadJEPA",
trust_remote_code=True
)
print(model)
Dependency note (timm)
If you encounter issues with newer versions of timm, install the known working version explicitly:
pip install timm==1.0.24
Training details
Training data
We used images from five public, deidentified chest X-ray datasets to train this checkpoint of RAD-DINO.
| Dataset | Num. images |
|---|---|
| MIMIC-CXR | 300 491 |
| CheXpert | 224 316 |
| NIH-CXR | 112 120 |
| PadChest | 160 817 |
| BRAX | 41 620 |
| TOTAL | 839 364 |
Biases, risks, and limitations
RAD-DINO was trained with data from three countries; therefore, it might be biased towards the population in the training data. Underlying biases of the training datasets may not be well characterized.
Training procedure
We refer to the manuscript for a detailed description of the training procedure.
Evaluation
Our evaluation is best described in the manuscript.
Baselines
We report results for a subset of consistently competitive baselines for clarity. Notably, RadJEPA uses a ViT-B/14 backbone (86M parameters), making it substantially smaller than I-JEPA (ViT-H/14, 0.6B parameters), yet it achieves superior performance across classification, segmentation, and report generation tasks. Furthermore, Rad-DINO and RadJEPA are the only methods pretrained on comparable chest X-ray datasets at similar scale, enabling a direct and fair comparison of self-supervised objectives under matched data and model capacity.
| Model | Backbone | # Params |
|---|---|---|
| Rad-DINO | ViT-B/14 | 86M |
| I-JEPA | ViT-H/14 | 0.6B |
| RadJEPA | ViT-B/14 | 86M |
Classification
| Model | VinDr-CXR (Agg. AP) | RSNA (AP / AUC) |
|---|---|---|
| RAD-DINO | 52.8 | 71.0 / 88.4 |
| I-JEPA | 50.0 | 70.2 / 87.4 |
| RadJEPA | 55.2 | 72.7 / 89.2 |
Segmentation
| Model | Decoder | Lungs | Lung Zones | Ribs |
|---|---|---|---|---|
| Rad-DINO | UPerNet | 98.0 | 91.2 | 85.3 |
| I-JEPA | UPerNet | 97.9 | 92.0 | 85.2 |
| RadJEPA | UPerNet | 98.3 | 93.7 | 89.6 |
Report Generation
| Model | MIMIC (ROUGE-L / BLEU-4) | IU (ROUGE-L / BLEU-4) |
|---|---|---|
| Rad-DINO | 24.6 / 9.3 | 25.8 / 9.0 |
| I-JEPA | 25.6 / 9.5 | 26.7 / 9.4 |
| RadJEPA | 26.1 / 10.1 | 28.4 / 9.9 |
Software
We leveraged the code in ijepa for training.
We used SimpleITK and Pydicom for processing of DICOM files.
Citation
@misc{khan2026radjeparadiologyencoderchest,
title={RadJEPA: Radiology Encoder for Chest X-Rays via Joint Embedding Predictive Architecture},
author={Anas Anwarul Haq Khan and Mariam Husain and Kshitij Jadhav},
year={2026},
eprint={2601.15891},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.15891},
}
Acknowledgements
RadJEPA builds upon the I-JEPA architecture.
We thank the authors for making their work publicly available.
Model Card Contact
Anas Anwarul Haq Khan
Department of Computer Science and Engineering, IIT Bombay
📧 anaskhan@cse.iitb.ac.in
Mariam Husain
Department of Biomedical Engineering, Johns Hopkins University, USA
📧 mhusai10@jh.edu
Kshitij Jadhav
Koita Centre for Digital Health, IIT Bombay
📧 kshitij.jadhav@iitb.ac.in
- Downloads last month
- 336