File size: 3,015 Bytes
560efad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80c54bf
 
 
 
d78bc66
 
80c54bf
 
 
 
e3380e0
80c54bf
e3380e0
80c54bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56fc2ab
 
 
 
1aedb53
56fc2ab
 
 
 
 
 
80c54bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3380e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80c54bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56fc2ab
80c54bf
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: mit
tags:
- multimodal
- medical
- cardiac
- cmr
- clip
- contrastive-learning
- vision-transformer
- clinical-bert
library_name: pytorch
pipeline_tag: feature-extraction
datasets:
- medical
language:
- en
---

# CMRCLIP

> A CMR-report contrastive model combining Vision Transformers and pretrained text encoders.

![CMRCLIP Model Overview](figs/overview.png)

---

## Model Overview

**CMRCLIP** encodes CMR(Cardiac Magnetic Resonance) images and clinical reports into a shared embedding space for retrieval, similarity scoring, and downstream tasks. It uses:

* A pretrained text encoder (`Bio_ClinicalBERT`)
* A video encoder built on Vision Transformers (`SpaceTimeTransformer`)
* A lightweight projection head to map both modalities into a common vector space

This repository contains only the trained weights and minimal configuration needed to load and run the model.

---

## Files

* `config.json` — Model hyperparameters & architecture settings
* `pytorch_model.bin` — Saved PyTorch `state_dict` of the trained model

---

## Usage Example

Below is a minimal example of how to download and load the model using the Hugging Face Hub:



```bash
# Clone the repository
git clone git@github.com:Makiya11/CMRCLIP.git
cd CMRCLIP

# Install dependencies
pip install -r requirements.txt
```

```python
import json
import torch
from huggingface_hub import hf_hub_download
from model.cmrclip import CMRCLIP

# 1. Download artifacts
def _download_file(filename):
    return hf_hub_download(
        repo_id="makiyeah/CMRCLIP",
        filename=filename
    )
config_file = _download_file("config.json")
weights_file = _download_file("pytorch_model.bin")

# 2. Load config & model
with open(config_file, "r") as f:
    cfg = json.load(f)

model = CMRCLIP(
    video_params=cfg["video_params"],
    text_params=cfg["text_params"],
    projection_dim=cfg.get("projection_dim", 512),
    load_checkpoint=cfg.get("load_checkpoint"),
    projection=cfg.get("projection", "minimal"),
)
state_dict = torch.load(weights_file)
model.load_state_dict(state_dict)
model.eval()
```

---

## Configuration (`config.json`)

```json
{
"video_params": {
    "model": "SpaceTimeTransformer",
    "arch_config": "base_patch16_224",
    "num_frames": 64,
    "pretrained": true,
    "time_init": "zeros"
},
"text_params": {
    "model": "emilyalsentzer/Bio_ClinicalBERT",
    "pretrained": true,
    "input": "text"
},
"projection": "minimal",
"projection_dim": 512,
"load_checkpoint": ""
}

```

---

## License

This model is released under the **MIT** license. See [LICENSE](LICENSE) for details.


---

## Citation

If you use this model in your work, please cite:

```bibtex
@misc{cmrclip2025,
  title={CMR-CLIP: Contrastive Language Image Pretraining for a Cardiac Magnetic Resonance Image Embedding with Zero-shot Capabilities},
  author={Makiya Nakashima, Jielin Qiu, Peide Huang, Po-Hao Chen, Richard Grimm, Christopher Nguyen, Byung-Hak Kim, Ding Zhao, Deborah Kwon, David Chen},
  year={2025},
}
```

---