OmniRad: A General-Purpose Radiological Foundation Model
OmniRad is a self-supervised radiological foundation model designed to learn stable, transferable, and task-agnostic visual representations for medical imaging. It is pretrained on large-scale, heterogeneous radiological data and intended for reuse across classification, segmentation, and exploratory vision–language tasks without task-specific pretraining.
This repository provides the OmniRad-base variant, a compact Vision Transformer encoder that offers an excellent trade-off between computational efficiency and representational power.
Key Features
- Radiology-focused foundation model pretrained on >1M radiological images
- Self-supervised learning based on a customized DINOv2 framework
- Task-agnostic encoder reusable across classification, segmentation, and multimodal pipelines
- Strong transferability across modalities (CT, MRI, X-ray, ultrasound)
- Radiomics-oriented design, emphasizing representation stability and reuse
Example Usage: Feature Extraction
from PIL import Image
from torchvision import transforms
import timm
import torch
# Load OmniRad-base from Hugging Face Hub
model = timm.create_model(
"hf_hub:Snarcy/OmniRad-base",
pretrained=True,
num_classes=0 # return embeddings
)
model.eval()
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Preprocessing
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
),
])
# Load image
image = Image.open("path/to/radiology_image.png").convert("RGB")
x = transform(image).unsqueeze(0).to(device)
# Extract features
with torch.no_grad():
embedding = model(x) # shape: [1, 384]
Available Downstream Code
The official OmniRad repository provides end-to-end implementations for all evaluated downstream tasks:
👉 https://github.com/unica-visual-intelligence-lab/OmniRad
Including:
- Image-level classification (MedMNIST v2 benchmarks)
- Dense medical image segmentation (MedSegBench, frozen encoder + lightweight decoders)
- Radiological image captioning (BART-based vision–language framework)
- Full training, evaluation, and ablation scripts
- Reproducible experimental configurations matching the paper
Model Details
- Architecture: Vision Transformer (ViT-B)
- Patch size: 14
- Embedding dimension: 768
- Pretraining framework: Modified DINOv2 (global crops only)
- Pretraining dataset: RadImageNet (~1.2M radiological images)
- Input resolution: 224 × 224
- Backbone type: Encoder-only (no task-specific heads)
Pretraining Notes
- Local crops are removed to improve training stability and downstream transferability
- No feature collapse observed during training
- Same hyperparameter configuration used across small and base variants
- Designed to support frozen-backbone adaptation and lightweight fine-tuning
Intended Use
OmniRad is intended as a general-purpose radiological image encoder for:
- Image-level classification (e.g., disease or organ recognition)
- Dense prediction (e.g., medical image segmentation via adapters or decoders)
- Radiomics feature extraction
- Representation transfer across datasets, modalities, and institutions
- Exploratory vision–language research (e.g., radiological image captioning)
Not intended for direct clinical deployment without task-specific validation.
License
This project and the released model weights are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Made with ❤️ by UNICA Visual Intelligence Lab
- Downloads last month
- 78
Collection including Snarcy/OmniRad-base
Evaluation results
- Representation learning on RadImageNetself-reportedSelf-supervised pretrained encoder