File size: 4,536 Bytes
3e42381 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | ---
library_name: timm
license: cc-by-4.0
pipeline_tag: image-feature-extraction
tags:
- radiology
- medical-imaging
- xray
- ct
- mri
- ultrasound
- foundation-model
- vision-transformer
- self-supervised
- dino
- dinov2
model-index:
- name: OmniRad-base
results:
- task:
type: image-feature-extraction
dataset:
name: RadImageNet
type: radimagenet
metrics:
- name: Representation learning
type: other
value: "Self-supervised pretrained encoder"
---
# OmniRad: A General-Purpose Radiological Foundation Model
<!--
[📄 Paper](https://arxiv.org/abs/XXXX.XXXXX) |
-->
[💻 Code](https://github.com/unica-visual-intelligence-lab/OmniRad)
**OmniRad** is a **self-supervised radiological foundation model** designed to learn **stable, transferable, and task-agnostic visual representations** for medical imaging. It is pretrained on large-scale, heterogeneous radiological data and intended for reuse across **classification**, **segmentation**, and **exploratory vision–language** tasks without task-specific pretraining.
This repository provides the **OmniRad-base** variant, a compact Vision Transformer encoder that offers an excellent trade-off between computational efficiency and representational power.
---
## Key Features
- **Radiology-focused foundation model** pretrained on >1M radiological images
- **Self-supervised learning** based on a customized DINOv2 framework
- **Task-agnostic encoder** reusable across classification, segmentation, and multimodal pipelines
- **Strong transferability** across modalities (CT, MRI, X-ray, ultrasound)
- **Radiomics-oriented design**, emphasizing representation stability and reuse
---
## Example Usage: Feature Extraction
```python
from PIL import Image
from torchvision import transforms
import timm
import torch
# Load OmniRad-base from Hugging Face Hub
model = timm.create_model(
"hf_hub:Snarcy/OmniRad-base",
pretrained=True,
num_classes=0 # return embeddings
)
model.eval()
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Preprocessing
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
),
])
# Load image
image = Image.open("path/to/radiology_image.png").convert("RGB")
x = transform(image).unsqueeze(0).to(device)
# Extract features
with torch.no_grad():
embedding = model(x) # shape: [1, 384]
```
---
## Available Downstream Code
The **official OmniRad repository** provides **end-to-end implementations** for all evaluated downstream tasks:
👉 **https://github.com/unica-visual-intelligence-lab/OmniRad**
Including:
- **Image-level classification** (MedMNIST v2 benchmarks)
- **Dense medical image segmentation** (MedSegBench, frozen encoder + lightweight decoders)
- **Radiological image captioning** (BART-based vision–language framework)
- Full training, evaluation, and ablation scripts
- Reproducible experimental configurations matching the paper
---
## Model Details
- **Architecture:** Vision Transformer (ViT-B)
- **Patch size:** 14
- **Embedding dimension:** 768
- **Pretraining framework:** Modified DINOv2 (global crops only)
- **Pretraining dataset:** RadImageNet (~1.2M radiological images)
- **Input resolution:** 224 × 224
- **Backbone type:** Encoder-only (no task-specific heads)
### Pretraining Notes
- Local crops are removed to improve training stability and downstream transferability
- No feature collapse observed during training
- Same hyperparameter configuration used across small and base variants
- Designed to support frozen-backbone adaptation and lightweight fine-tuning
---
## Intended Use
OmniRad is intended as a **general-purpose radiological image encoder** for:
- Image-level classification (e.g., disease or organ recognition)
- Dense prediction (e.g., medical image segmentation via adapters or decoders)
- Radiomics feature extraction
- Representation transfer across datasets, modalities, and institutions
- Exploratory vision–language research (e.g., radiological image captioning)
**Not intended for direct clinical deployment without task-specific validation.**
---
## License
This project and the released model weights are licensed under the Creative Commons
Attribution 4.0 International (CC BY 4.0) license.
<div align="center">
**Made with ❤️ by [UNICA Visual Intelligence Lab](https://github.com/unica-visual-intelligence-lab)**
</div> |