|
|
--- |
|
|
library_name: timm |
|
|
license: cc-by-4.0 |
|
|
pipeline_tag: image-feature-extraction |
|
|
tags: |
|
|
- radiology |
|
|
- medical-imaging |
|
|
- xray |
|
|
- ct |
|
|
- mri |
|
|
- ultrasound |
|
|
- foundation-model |
|
|
- vision-transformer |
|
|
- self-supervised |
|
|
- dino |
|
|
- dinov2 |
|
|
|
|
|
model-index: |
|
|
- name: OmniRad-base |
|
|
results: |
|
|
- task: |
|
|
type: image-feature-extraction |
|
|
dataset: |
|
|
name: RadImageNet |
|
|
type: radimagenet |
|
|
metrics: |
|
|
- name: Representation learning |
|
|
type: other |
|
|
value: "Self-supervised pretrained encoder" |
|
|
--- |
|
|
|
|
|
# OmniRad: A General-Purpose Radiological Foundation Model |
|
|
<!-- |
|
|
[📄 Paper](https://arxiv.org/abs/XXXX.XXXXX) | |
|
|
--> |
|
|
[💻 Code](https://github.com/unica-visual-intelligence-lab/OmniRad) |
|
|
|
|
|
**OmniRad** is a **self-supervised radiological foundation model** designed to learn **stable, transferable, and task-agnostic visual representations** for medical imaging. It is pretrained on large-scale, heterogeneous radiological data and intended for reuse across **classification**, **segmentation**, and **exploratory vision–language** tasks without task-specific pretraining. |
|
|
|
|
|
This repository provides the **OmniRad-base** variant, a compact Vision Transformer encoder that offers an excellent trade-off between computational efficiency and representational power. |
|
|
|
|
|
--- |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- **Radiology-focused foundation model** pretrained on >1M radiological images |
|
|
- **Self-supervised learning** based on a customized DINOv2 framework |
|
|
- **Task-agnostic encoder** reusable across classification, segmentation, and multimodal pipelines |
|
|
- **Strong transferability** across modalities (CT, MRI, X-ray, ultrasound) |
|
|
- **Radiomics-oriented design**, emphasizing representation stability and reuse |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## Example Usage: Feature Extraction |
|
|
|
|
|
```python |
|
|
from PIL import Image |
|
|
from torchvision import transforms |
|
|
import timm |
|
|
import torch |
|
|
|
|
|
# Load OmniRad-base from Hugging Face Hub |
|
|
model = timm.create_model( |
|
|
"hf_hub:Snarcy/OmniRad-base", |
|
|
pretrained=True, |
|
|
num_classes=0 # return embeddings |
|
|
) |
|
|
|
|
|
model.eval() |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
model.to(device) |
|
|
|
|
|
# Preprocessing |
|
|
transform = transforms.Compose([ |
|
|
transforms.Resize((224, 224)), |
|
|
transforms.ToTensor(), |
|
|
transforms.Normalize( |
|
|
mean=[0.485, 0.456, 0.406], |
|
|
std=[0.229, 0.224, 0.225], |
|
|
), |
|
|
]) |
|
|
|
|
|
# Load image |
|
|
image = Image.open("path/to/radiology_image.png").convert("RGB") |
|
|
x = transform(image).unsqueeze(0).to(device) |
|
|
|
|
|
# Extract features |
|
|
with torch.no_grad(): |
|
|
embedding = model(x) # shape: [1, 384] |
|
|
|
|
|
|
|
|
``` |
|
|
--- |
|
|
|
|
|
## Available Downstream Code |
|
|
|
|
|
The **official OmniRad repository** provides **end-to-end implementations** for all evaluated downstream tasks: |
|
|
|
|
|
👉 **https://github.com/unica-visual-intelligence-lab/OmniRad** |
|
|
|
|
|
Including: |
|
|
- **Image-level classification** (MedMNIST v2 benchmarks) |
|
|
- **Dense medical image segmentation** (MedSegBench, frozen encoder + lightweight decoders) |
|
|
- **Radiological image captioning** (BART-based vision–language framework) |
|
|
- Full training, evaluation, and ablation scripts |
|
|
- Reproducible experimental configurations matching the paper |
|
|
|
|
|
--- |
|
|
## Model Details |
|
|
|
|
|
- **Architecture:** Vision Transformer (ViT-B) |
|
|
- **Patch size:** 14 |
|
|
- **Embedding dimension:** 768 |
|
|
- **Pretraining framework:** Modified DINOv2 (global crops only) |
|
|
- **Pretraining dataset:** RadImageNet (~1.2M radiological images) |
|
|
- **Input resolution:** 224 × 224 |
|
|
- **Backbone type:** Encoder-only (no task-specific heads) |
|
|
|
|
|
### Pretraining Notes |
|
|
|
|
|
- Local crops are removed to improve training stability and downstream transferability |
|
|
- No feature collapse observed during training |
|
|
- Same hyperparameter configuration used across small and base variants |
|
|
- Designed to support frozen-backbone adaptation and lightweight fine-tuning |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## Intended Use |
|
|
|
|
|
OmniRad is intended as a **general-purpose radiological image encoder** for: |
|
|
|
|
|
- Image-level classification (e.g., disease or organ recognition) |
|
|
- Dense prediction (e.g., medical image segmentation via adapters or decoders) |
|
|
- Radiomics feature extraction |
|
|
- Representation transfer across datasets, modalities, and institutions |
|
|
- Exploratory vision–language research (e.g., radiological image captioning) |
|
|
|
|
|
**Not intended for direct clinical deployment without task-specific validation.** |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
## License |
|
|
|
|
|
This project and the released model weights are licensed under the Creative Commons |
|
|
Attribution 4.0 International (CC BY 4.0) license. |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**Made with ❤️ by [UNICA Visual Intelligence Lab](https://github.com/unica-visual-intelligence-lab)** |
|
|
|
|
|
</div> |