OmniRad: A General-Purpose Radiological Foundation Model

💻 Code

OmniRad is a self-supervised radiological foundation model designed to learn stable, transferable, and task-agnostic visual representations for medical imaging. It is pretrained on large-scale, heterogeneous radiological data and intended for reuse across classification, segmentation, and exploratory vision–language tasks without task-specific pretraining.

This repository provides the OmniRad-small variant, a compact Vision Transformer encoder that offers an excellent trade-off between computational efficiency and representational power.


Key Features

  • Radiology-focused foundation model pretrained on >1M radiological images
  • Self-supervised learning based on a customized DINOv2 framework
  • Task-agnostic encoder reusable across classification, segmentation, and multimodal pipelines
  • Strong transferability across modalities (CT, MRI, X-ray, ultrasound)
  • Radiomics-oriented design, emphasizing representation stability and reuse

Example Usage: Feature Extraction

from PIL import Image
from torchvision import transforms
import timm
import torch

# Load OmniRad-small from Hugging Face Hub
model = timm.create_model(
    "hf_hub:Snarcy/OmniRad-small",
    pretrained=True,
    num_classes=0  # return embeddings
)

model.eval()
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])

# Load image
image = Image.open("path/to/radiology_image.png").convert("RGB")
x = transform(image).unsqueeze(0).to(device)

# Extract features
with torch.no_grad():
    embedding = model(x)  # shape: [1, 384]


Available Downstream Code

The official OmniRad repository provides end-to-end implementations for all evaluated downstream tasks:

👉 https://github.com/unica-visual-intelligence-lab/OmniRad

Including:

  • Image-level classification (MedMNIST v2 benchmarks)
  • Dense medical image segmentation (MedSegBench, frozen encoder + lightweight decoders)
  • Radiological image captioning (BART-based vision–language framework)
  • Full training, evaluation, and ablation scripts
  • Reproducible experimental configurations matching the paper

Model Details

  • Architecture: Vision Transformer (ViT-S)
  • Patch size: 14
  • Embedding dimension: 384
  • Pretraining framework: Modified DINOv2 (global crops only)
  • Pretraining dataset: RadImageNet (~1.2M radiological images)
  • Input resolution: 224 × 224
  • Backbone type: Encoder-only (no task-specific heads)

Pretraining Notes

  • Local crops are removed to improve training stability and downstream transferability
  • No feature collapse observed during training
  • Same hyperparameter configuration used across small and base variants
  • Designed to support frozen-backbone adaptation and lightweight fine-tuning

Intended Use

OmniRad is intended as a general-purpose radiological image encoder for:

  • Image-level classification (e.g., disease or organ recognition)
  • Dense prediction (e.g., medical image segmentation via adapters or decoders)
  • Radiomics feature extraction
  • Representation transfer across datasets, modalities, and institutions
  • Exploratory vision–language research (e.g., radiological image captioning)

Not intended for direct clinical deployment without task-specific validation.


License

This project and the released model weights are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Made with ❤️ by UNICA Visual Intelligence Lab

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Snarcy/OmniRad-small

Evaluation results

  • Representation learning on RadImageNet
    self-reported
    Self-supervised pretrained encoder