File size: 4,536 Bytes
3e42381
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
library_name: timm
license: cc-by-4.0
pipeline_tag: image-feature-extraction
tags:
- radiology
- medical-imaging
- xray
- ct
- mri
- ultrasound
- foundation-model
- vision-transformer
- self-supervised
- dino
- dinov2

model-index:
- name: OmniRad-base
  results:
  - task:
      type: image-feature-extraction
    dataset:
      name: RadImageNet
      type: radimagenet
    metrics:
    - name: Representation learning
      type: other
      value: "Self-supervised pretrained encoder"
---

# OmniRad: A General-Purpose Radiological Foundation Model
<!--
[📄 Paper](https://arxiv.org/abs/XXXX.XXXXX) |
-->
 [💻 Code](https://github.com/unica-visual-intelligence-lab/OmniRad)

**OmniRad** is a **self-supervised radiological foundation model** designed to learn **stable, transferable, and task-agnostic visual representations** for medical imaging. It is pretrained on large-scale, heterogeneous radiological data and intended for reuse across **classification**, **segmentation**, and **exploratory vision–language** tasks without task-specific pretraining.

This repository provides the **OmniRad-base** variant, a compact Vision Transformer encoder that offers an excellent trade-off between computational efficiency and representational power.

---

## Key Features

- **Radiology-focused foundation model** pretrained on >1M radiological images
- **Self-supervised learning** based on a customized DINOv2 framework
- **Task-agnostic encoder** reusable across classification, segmentation, and multimodal pipelines
- **Strong transferability** across modalities (CT, MRI, X-ray, ultrasound)
- **Radiomics-oriented design**, emphasizing representation stability and reuse

---


## Example Usage: Feature Extraction

```python
from PIL import Image
from torchvision import transforms
import timm
import torch

# Load OmniRad-base from Hugging Face Hub
model = timm.create_model(
    "hf_hub:Snarcy/OmniRad-base",
    pretrained=True,
    num_classes=0  # return embeddings
)

model.eval()
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])

# Load image
image = Image.open("path/to/radiology_image.png").convert("RGB")
x = transform(image).unsqueeze(0).to(device)

# Extract features
with torch.no_grad():
    embedding = model(x)  # shape: [1, 384]


```
---

## Available Downstream Code

The **official OmniRad repository** provides **end-to-end implementations** for all evaluated downstream tasks:

👉 **https://github.com/unica-visual-intelligence-lab/OmniRad**

Including:
- **Image-level classification** (MedMNIST v2 benchmarks)
- **Dense medical image segmentation** (MedSegBench, frozen encoder + lightweight decoders)
- **Radiological image captioning** (BART-based vision–language framework)
- Full training, evaluation, and ablation scripts
- Reproducible experimental configurations matching the paper

---
## Model Details

- **Architecture:** Vision Transformer (ViT-B)
- **Patch size:** 14
- **Embedding dimension:** 768
- **Pretraining framework:** Modified DINOv2 (global crops only)
- **Pretraining dataset:** RadImageNet (~1.2M radiological images)
- **Input resolution:** 224 × 224
- **Backbone type:** Encoder-only (no task-specific heads)

### Pretraining Notes

- Local crops are removed to improve training stability and downstream transferability
- No feature collapse observed during training
- Same hyperparameter configuration used across small and base variants
- Designed to support frozen-backbone adaptation and lightweight fine-tuning

---


## Intended Use

OmniRad is intended as a **general-purpose radiological image encoder** for:

- Image-level classification (e.g., disease or organ recognition)
- Dense prediction (e.g., medical image segmentation via adapters or decoders)
- Radiomics feature extraction
- Representation transfer across datasets, modalities, and institutions
- Exploratory vision–language research (e.g., radiological image captioning)

**Not intended for direct clinical deployment without task-specific validation.**

---



## License

This project and the released model weights are licensed under the Creative Commons
Attribution 4.0 International (CC BY 4.0) license.

<div align="center">

**Made with ❤️ by [UNICA Visual Intelligence Lab](https://github.com/unica-visual-intelligence-lab)**

</div>