README.md · Snarcy/OmniRad-base at main

OmniRad-base / README.md

Snarcy

Update README.md

3e42381 verified 11 days ago

preview code

raw

history blame contribute delete

4.54 kB

	---
	library_name: timm
	license: cc-by-4.0
	pipeline_tag: image-feature-extraction
	tags:
	- radiology
	- medical-imaging
	- xray
	- ct
	- mri
	- ultrasound
	- foundation-model
	- vision-transformer
	- self-supervised
	- dino
	- dinov2

	model-index:
	- name: OmniRad-base
	results:
	- task:
	type: image-feature-extraction
	dataset:
	name: RadImageNet
	type: radimagenet
	metrics:
	- name: Representation learning
	type: other
	value: "Self-supervised pretrained encoder"
	---

	# OmniRad: A General-Purpose Radiological Foundation Model
	<!--
	[📄 Paper](https://arxiv.org/abs/XXXX.XXXXX) \|
	-->
	[💻 Code](https://github.com/unica-visual-intelligence-lab/OmniRad)

	OmniRad is a self-supervised radiological foundation model designed to learn stable, transferable, and task-agnostic visual representations for medical imaging. It is pretrained on large-scale, heterogeneous radiological data and intended for reuse across classification, segmentation, and exploratory vision–language tasks without task-specific pretraining.

	This repository provides the OmniRad-base variant, a compact Vision Transformer encoder that offers an excellent trade-off between computational efficiency and representational power.

	---

	## Key Features

	- Radiology-focused foundation model pretrained on >1M radiological images
	- Self-supervised learning based on a customized DINOv2 framework
	- Task-agnostic encoder reusable across classification, segmentation, and multimodal pipelines
	- Strong transferability across modalities (CT, MRI, X-ray, ultrasound)
	- Radiomics-oriented design, emphasizing representation stability and reuse

	---


	## Example Usage: Feature Extraction

	```python
	from PIL import Image
	from torchvision import transforms
	import timm
	import torch

	# Load OmniRad-base from Hugging Face Hub
	model = timm.create_model(
	"hf_hub:Snarcy/OmniRad-base",
	pretrained=True,
	num_classes=0 # return embeddings
	)

	model.eval()
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)

	# Preprocessing
	transform = transforms.Compose([
	transforms.Resize((224, 224)),
	transforms.ToTensor(),
	transforms.Normalize(
	mean=[0.485, 0.456, 0.406],
	std=[0.229, 0.224, 0.225],
	),
	])

	# Load image
	image = Image.open("path/to/radiology_image.png").convert("RGB")
	x = transform(image).unsqueeze(0).to(device)

	# Extract features
	with torch.no_grad():
	embedding = model(x) # shape: [1, 384]


	```
	---

	## Available Downstream Code

	The official OmniRad repository provides end-to-end implementations for all evaluated downstream tasks:

	👉 https://github.com/unica-visual-intelligence-lab/OmniRad

	Including:
	- Image-level classification (MedMNIST v2 benchmarks)
	- Dense medical image segmentation (MedSegBench, frozen encoder + lightweight decoders)
	- Radiological image captioning (BART-based vision–language framework)
	- Full training, evaluation, and ablation scripts
	- Reproducible experimental configurations matching the paper

	---
	## Model Details

	- Architecture: Vision Transformer (ViT-B)
	- Patch size: 14
	- Embedding dimension: 768
	- Pretraining framework: Modified DINOv2 (global crops only)
	- Pretraining dataset: RadImageNet (~1.2M radiological images)
	- Input resolution: 224 × 224
	- Backbone type: Encoder-only (no task-specific heads)

	### Pretraining Notes

	- Local crops are removed to improve training stability and downstream transferability
	- No feature collapse observed during training
	- Same hyperparameter configuration used across small and base variants
	- Designed to support frozen-backbone adaptation and lightweight fine-tuning

	---


	## Intended Use

	OmniRad is intended as a general-purpose radiological image encoder for:

	- Image-level classification (e.g., disease or organ recognition)
	- Dense prediction (e.g., medical image segmentation via adapters or decoders)
	- Radiomics feature extraction
	- Representation transfer across datasets, modalities, and institutions
	- Exploratory vision–language research (e.g., radiological image captioning)

	Not intended for direct clinical deployment without task-specific validation.

	---



	## License

	This project and the released model weights are licensed under the Creative Commons
	Attribution 4.0 International (CC BY 4.0) license.

	<div align="center">

	Made with ❤️ by [UNICA Visual Intelligence Lab](https://github.com/unica-visual-intelligence-lab)

	</div>