| | --- |
| | library_name: timm |
| | license: cc-by-4.0 |
| | pipeline_tag: image-feature-extraction |
| | tags: |
| | - radiology |
| | - medical-imaging |
| | - xray |
| | - ct |
| | - mri |
| | - ultrasound |
| | - foundation-model |
| | - vision-transformer |
| | - self-supervised |
| | - dino |
| | - dinov2 |
| |
|
| | model-index: |
| | - name: OmniRad-small |
| | results: |
| | - task: |
| | type: image-feature-extraction |
| | dataset: |
| | name: RadImageNet |
| | type: radimagenet |
| | metrics: |
| | - name: Representation learning |
| | type: other |
| | value: "Self-supervised pretrained encoder" |
| | --- |
| | |
| | # OmniRad: A General-Purpose Radiological Foundation Model |
| | <!-- |
| | [📄 Paper](https://arxiv.org/abs/XXXX.XXXXX) | |
| | --> |
| | [💻 Code](https://github.com/unica-visual-intelligence-lab/OmniRad) |
| |
|
| | **OmniRad** is a **self-supervised radiological foundation model** designed to learn **stable, transferable, and task-agnostic visual representations** for medical imaging. It is pretrained on large-scale, heterogeneous radiological data and intended for reuse across **classification**, **segmentation**, and **exploratory vision–language** tasks without task-specific pretraining. |
| |
|
| | This repository provides the **OmniRad-small** variant, a compact Vision Transformer encoder that offers an excellent trade-off between computational efficiency and representational power. |
| |
|
| | --- |
| |
|
| | ## Key Features |
| |
|
| | - **Radiology-focused foundation model** pretrained on >1M radiological images |
| | - **Self-supervised learning** based on a customized DINOv2 framework |
| | - **Task-agnostic encoder** reusable across classification, segmentation, and multimodal pipelines |
| | - **Strong transferability** across modalities (CT, MRI, X-ray, ultrasound) |
| | - **Radiomics-oriented design**, emphasizing representation stability and reuse |
| |
|
| | --- |
| |
|
| |
|
| | ## Example Usage: Feature Extraction |
| |
|
| | ```python |
| | from PIL import Image |
| | from torchvision import transforms |
| | import timm |
| | import torch |
| | |
| | # Load OmniRad-small from Hugging Face Hub |
| | model = timm.create_model( |
| | "hf_hub:Snarcy/OmniRad-small", |
| | pretrained=True, |
| | num_classes=0 # return embeddings |
| | ) |
| | |
| | model.eval() |
| | device = "cuda" if torch.cuda.is_available() else "cpu" |
| | model.to(device) |
| | |
| | # Preprocessing |
| | transform = transforms.Compose([ |
| | transforms.Resize((224, 224)), |
| | transforms.ToTensor(), |
| | transforms.Normalize( |
| | mean=[0.485, 0.456, 0.406], |
| | std=[0.229, 0.224, 0.225], |
| | ), |
| | ]) |
| | |
| | # Load image |
| | image = Image.open("path/to/radiology_image.png").convert("RGB") |
| | x = transform(image).unsqueeze(0).to(device) |
| | |
| | # Extract features |
| | with torch.no_grad(): |
| | embedding = model(x) # shape: [1, 384] |
| | |
| | |
| | ``` |
| | --- |
| |
|
| | ## Available Downstream Code |
| |
|
| | The **official OmniRad repository** provides **end-to-end implementations** for all evaluated downstream tasks: |
| |
|
| | 👉 **https://github.com/unica-visual-intelligence-lab/OmniRad** |
| |
|
| | Including: |
| | - **Image-level classification** (MedMNIST v2 benchmarks) |
| | - **Dense medical image segmentation** (MedSegBench, frozen encoder + lightweight decoders) |
| | - **Radiological image captioning** (BART-based vision–language framework) |
| | - Full training, evaluation, and ablation scripts |
| | - Reproducible experimental configurations matching the paper |
| |
|
| | --- |
| | ## Model Details |
| |
|
| | - **Architecture:** Vision Transformer (ViT-S) |
| | - **Patch size:** 14 |
| | - **Embedding dimension:** 384 |
| | - **Pretraining framework:** Modified DINOv2 (global crops only) |
| | - **Pretraining dataset:** RadImageNet (~1.2M radiological images) |
| | - **Input resolution:** 224 × 224 |
| | - **Backbone type:** Encoder-only (no task-specific heads) |
| |
|
| | ### Pretraining Notes |
| |
|
| | - Local crops are removed to improve training stability and downstream transferability |
| | - No feature collapse observed during training |
| | - Same hyperparameter configuration used across small and base variants |
| | - Designed to support frozen-backbone adaptation and lightweight fine-tuning |
| |
|
| | --- |
| |
|
| |
|
| | ## Intended Use |
| |
|
| | OmniRad is intended as a **general-purpose radiological image encoder** for: |
| |
|
| | - Image-level classification (e.g., disease or organ recognition) |
| | - Dense prediction (e.g., medical image segmentation via adapters or decoders) |
| | - Radiomics feature extraction |
| | - Representation transfer across datasets, modalities, and institutions |
| | - Exploratory vision–language research (e.g., radiological image captioning) |
| |
|
| | **Not intended for direct clinical deployment without task-specific validation.** |
| |
|
| | --- |
| |
|
| |
|
| |
|
| | ## License |
| |
|
| | This project and the released model weights are licensed under the Creative Commons |
| | Attribution 4.0 International (CC BY 4.0) license. |
| |
|
| | <div align="center"> |
| |
|
| | **Made with ❤️ by [UNICA Visual Intelligence Lab](https://github.com/unica-visual-intelligence-lab)** |
| |
|
| | </div> |