Resources

📄 Paper: https://arxiv.org/abs/2601.15891
🤗 Hugging Face: https://huggingface.co/AIDELab-IITBombay/RadJEPA
💻 Code (GitHub): https://github.com/aidelab-iitbombay/RadJEPA

RadJEPA

RadJEPA is a self-supervised vision encoder for chest X-ray images based on a Joint Embedding Predictive Architecture (JEPA).
The model learns visual representations by predicting latent features of masked image regions, without text supervision or pixel-level reconstruction.

RadJEPA is intended as a general-purpose radiology image backbone for downstream tasks.

Overview

Model type: Vision Transformer–based JEPA encoder
Training: Self-supervised latent prediction
Input: Chest X-ray images
Finetuned from model: ijepa

Intended use

The model is a vision backbone that can be plugged to other models for downstream tasks. Typical downstream applications include:

Multi-label classification
Semantic segmentation using patch embeddings
Image retrieval and clustering
Report generation, with a language model to decode text

Load RadJEPA

from transformers import AutoModel
model = AutoModel.from_pretrained(
    "AIDElab-IITBombay/RadJEPA",
    trust_remote_code=True
)
print(model)

Dependency note (timm)

If you encounter issues with newer versions of timm, install the known working version explicitly:

pip install timm==1.0.24

Training details

Training data

We used images from five public, deidentified chest X-ray datasets to train this checkpoint of RAD-DINO.

Dataset	Num. images
MIMIC-CXR	300 491
CheXpert	224 316
NIH-CXR	112 120
PadChest	160 817
BRAX	41 620
TOTAL	839 364

Biases, risks, and limitations

RAD-DINO was trained with data from three countries; therefore, it might be biased towards the population in the training data. Underlying biases of the training datasets may not be well characterized.

Training procedure

We refer to the manuscript for a detailed description of the training procedure.

Evaluation

Our evaluation is best described in the manuscript.

Baselines

We report results for a subset of consistently competitive baselines for clarity. Notably, RadJEPA uses a ViT-B/14 backbone (86M parameters), making it substantially smaller than I-JEPA (ViT-H/14, 0.6B parameters), yet it achieves superior performance across classification, segmentation, and report generation tasks. Furthermore, Rad-DINO and RadJEPA are the only methods pretrained on comparable chest X-ray datasets at similar scale, enabling a direct and fair comparison of self-supervised objectives under matched data and model capacity.

Model	Backbone	# Params
Rad-DINO	ViT-B/14	86M
I-JEPA	ViT-H/14	0.6B
RadJEPA	ViT-B/14	86M

Classification

Model	VinDr-CXR (Agg. AP)	RSNA (AP / AUC)
RAD-DINO	52.8	71.0 / 88.4
I-JEPA	50.0	70.2 / 87.4
RadJEPA	55.2	72.7 / 89.2

Segmentation

Model	Decoder	Lungs	Lung Zones	Ribs
Rad-DINO	UPerNet	98.0	91.2	85.3
I-JEPA	UPerNet	97.9	92.0	85.2
RadJEPA	UPerNet	98.3	93.7	89.6

Report Generation

Model	MIMIC (ROUGE-L / BLEU-4)	IU (ROUGE-L / BLEU-4)
Rad-DINO	24.6 / 9.3	25.8 / 9.0
I-JEPA	25.6 / 9.5	26.7 / 9.4
RadJEPA	26.1 / 10.1	28.4 / 9.9

Software

We leveraged the code in ijepa for training. We used SimpleITK and Pydicom for processing of DICOM files.

Citation

@misc{khan2026radjeparadiologyencoderchest,
      title={RadJEPA: Radiology Encoder for Chest X-Rays via Joint Embedding Predictive Architecture}, 
      author={Anas Anwarul Haq Khan and Mariam Husain and Kshitij Jadhav},
      year={2026},
      eprint={2601.15891},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.15891}, 
}

Acknowledgements

RadJEPA builds upon the I-JEPA architecture.
We thank the authors for making their work publicly available.

Model Card Contact

Anas Anwarul Haq Khan
Department of Computer Science and Engineering, IIT Bombay
📧 anaskhan@cse.iitb.ac.in

Mariam Husain
Department of Biomedical Engineering, Johns Hopkins University, USA
📧 mhusai10@jh.edu

Kshitij Jadhav
Koita Centre for Digital Health, IIT Bombay
📧 kshitij.jadhav@iitb.ac.in

Downloads last month: 1,247

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for AIDElab-IITBombay/RadJEPA

RadJEPA: Radiology Encoder for Chest X-Rays via Joint Embedding Predictive Architecture

Paper • 2601.15891 • Published Jan 22