|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- multimodal |
|
|
- medical |
|
|
- cardiac |
|
|
- cmr |
|
|
- clip |
|
|
- contrastive-learning |
|
|
- vision-transformer |
|
|
- clinical-bert |
|
|
library_name: pytorch |
|
|
pipeline_tag: feature-extraction |
|
|
datasets: |
|
|
- medical |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# CMRCLIP |
|
|
|
|
|
> A CMR-report contrastive model combining Vision Transformers and pretrained text encoders. |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
**CMRCLIP** encodes CMR(Cardiac Magnetic Resonance) images and clinical reports into a shared embedding space for retrieval, similarity scoring, and downstream tasks. It uses: |
|
|
|
|
|
* A pretrained text encoder (`Bio_ClinicalBERT`) |
|
|
* A video encoder built on Vision Transformers (`SpaceTimeTransformer`) |
|
|
* A lightweight projection head to map both modalities into a common vector space |
|
|
|
|
|
This repository contains only the trained weights and minimal configuration needed to load and run the model. |
|
|
|
|
|
--- |
|
|
|
|
|
## Files |
|
|
|
|
|
* `config.json` — Model hyperparameters & architecture settings |
|
|
* `pytorch_model.bin` — Saved PyTorch `state_dict` of the trained model |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage Example |
|
|
|
|
|
Below is a minimal example of how to download and load the model using the Hugging Face Hub: |
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone git@github.com:Makiya11/CMRCLIP.git |
|
|
cd CMRCLIP |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
```python |
|
|
import json |
|
|
import torch |
|
|
from huggingface_hub import hf_hub_download |
|
|
from model.cmrclip import CMRCLIP |
|
|
|
|
|
# 1. Download artifacts |
|
|
def _download_file(filename): |
|
|
return hf_hub_download( |
|
|
repo_id="makiyeah/CMRCLIP", |
|
|
filename=filename |
|
|
) |
|
|
config_file = _download_file("config.json") |
|
|
weights_file = _download_file("pytorch_model.bin") |
|
|
|
|
|
# 2. Load config & model |
|
|
with open(config_file, "r") as f: |
|
|
cfg = json.load(f) |
|
|
|
|
|
model = CMRCLIP( |
|
|
video_params=cfg["video_params"], |
|
|
text_params=cfg["text_params"], |
|
|
projection_dim=cfg.get("projection_dim", 512), |
|
|
load_checkpoint=cfg.get("load_checkpoint"), |
|
|
projection=cfg.get("projection", "minimal"), |
|
|
) |
|
|
state_dict = torch.load(weights_file) |
|
|
model.load_state_dict(state_dict) |
|
|
model.eval() |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Configuration (`config.json`) |
|
|
|
|
|
```json |
|
|
{ |
|
|
"video_params": { |
|
|
"model": "SpaceTimeTransformer", |
|
|
"arch_config": "base_patch16_224", |
|
|
"num_frames": 64, |
|
|
"pretrained": true, |
|
|
"time_init": "zeros" |
|
|
}, |
|
|
"text_params": { |
|
|
"model": "emilyalsentzer/Bio_ClinicalBERT", |
|
|
"pretrained": true, |
|
|
"input": "text" |
|
|
}, |
|
|
"projection": "minimal", |
|
|
"projection_dim": 512, |
|
|
"load_checkpoint": "" |
|
|
} |
|
|
|
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the **MIT** license. See [LICENSE](LICENSE) for details. |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your work, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{cmrclip2025, |
|
|
title={CMR-CLIP: Contrastive Language Image Pretraining for a Cardiac Magnetic Resonance Image Embedding with Zero-shot Capabilities}, |
|
|
author={Makiya Nakashima, Jielin Qiu, Peide Huang, Po-Hao Chen, Richard Grimm, Christopher Nguyen, Byung-Hak Kim, Ding Zhao, Deborah Kwon, David Chen}, |
|
|
year={2025}, |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|