File size: 3,015 Bytes
560efad 80c54bf d78bc66 80c54bf e3380e0 80c54bf e3380e0 80c54bf 56fc2ab 1aedb53 56fc2ab 80c54bf e3380e0 80c54bf 56fc2ab 80c54bf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
license: mit
tags:
- multimodal
- medical
- cardiac
- cmr
- clip
- contrastive-learning
- vision-transformer
- clinical-bert
library_name: pytorch
pipeline_tag: feature-extraction
datasets:
- medical
language:
- en
---
# CMRCLIP
> A CMR-report contrastive model combining Vision Transformers and pretrained text encoders.

---
## Model Overview
**CMRCLIP** encodes CMR(Cardiac Magnetic Resonance) images and clinical reports into a shared embedding space for retrieval, similarity scoring, and downstream tasks. It uses:
* A pretrained text encoder (`Bio_ClinicalBERT`)
* A video encoder built on Vision Transformers (`SpaceTimeTransformer`)
* A lightweight projection head to map both modalities into a common vector space
This repository contains only the trained weights and minimal configuration needed to load and run the model.
---
## Files
* `config.json` — Model hyperparameters & architecture settings
* `pytorch_model.bin` — Saved PyTorch `state_dict` of the trained model
---
## Usage Example
Below is a minimal example of how to download and load the model using the Hugging Face Hub:
```bash
# Clone the repository
git clone git@github.com:Makiya11/CMRCLIP.git
cd CMRCLIP
# Install dependencies
pip install -r requirements.txt
```
```python
import json
import torch
from huggingface_hub import hf_hub_download
from model.cmrclip import CMRCLIP
# 1. Download artifacts
def _download_file(filename):
return hf_hub_download(
repo_id="makiyeah/CMRCLIP",
filename=filename
)
config_file = _download_file("config.json")
weights_file = _download_file("pytorch_model.bin")
# 2. Load config & model
with open(config_file, "r") as f:
cfg = json.load(f)
model = CMRCLIP(
video_params=cfg["video_params"],
text_params=cfg["text_params"],
projection_dim=cfg.get("projection_dim", 512),
load_checkpoint=cfg.get("load_checkpoint"),
projection=cfg.get("projection", "minimal"),
)
state_dict = torch.load(weights_file)
model.load_state_dict(state_dict)
model.eval()
```
---
## Configuration (`config.json`)
```json
{
"video_params": {
"model": "SpaceTimeTransformer",
"arch_config": "base_patch16_224",
"num_frames": 64,
"pretrained": true,
"time_init": "zeros"
},
"text_params": {
"model": "emilyalsentzer/Bio_ClinicalBERT",
"pretrained": true,
"input": "text"
},
"projection": "minimal",
"projection_dim": 512,
"load_checkpoint": ""
}
```
---
## License
This model is released under the **MIT** license. See [LICENSE](LICENSE) for details.
---
## Citation
If you use this model in your work, please cite:
```bibtex
@misc{cmrclip2025,
title={CMR-CLIP: Contrastive Language Image Pretraining for a Cardiac Magnetic Resonance Image Embedding with Zero-shot Capabilities},
author={Makiya Nakashima, Jielin Qiu, Peide Huang, Po-Hao Chen, Richard Grimm, Christopher Nguyen, Byung-Hak Kim, Ding Zhao, Deborah Kwon, David Chen},
year={2025},
}
```
---
|