makiyeah
/

CMRCLIP

Feature Extraction

contrastive-learning

vision-transformer

Model card Files Files and versions

CMRCLIP / README.md

makiyeah's picture

Upload README.md

1aedb53 verified 7 months ago

|

history blame contribute delete

3.02 kB

	---
	license: mit
	tags:
	- multimodal
	- medical
	- cardiac
	- cmr
	- clip
	- contrastive-learning
	- vision-transformer
	- clinical-bert
	library_name: pytorch
	pipeline_tag: feature-extraction
	datasets:
	- medical
	language:
	- en
	---

	# CMRCLIP

	> A CMR-report contrastive model combining Vision Transformers and pretrained text encoders.

	![CMRCLIP Model Overview](figs/overview.png)

	---

	## Model Overview

	CMRCLIP encodes CMR(Cardiac Magnetic Resonance) images and clinical reports into a shared embedding space for retrieval, similarity scoring, and downstream tasks. It uses:

	* A pretrained text encoder (`Bio_ClinicalBERT`)
	* A video encoder built on Vision Transformers (`SpaceTimeTransformer`)
	* A lightweight projection head to map both modalities into a common vector space

	This repository contains only the trained weights and minimal configuration needed to load and run the model.

	---

	## Files

	* `config.json` — Model hyperparameters & architecture settings
	* `pytorch_model.bin` — Saved PyTorch `state_dict` of the trained model

	---

	## Usage Example

	Below is a minimal example of how to download and load the model using the Hugging Face Hub:



	```bash
	# Clone the repository
	git clone git@github.com:Makiya11/CMRCLIP.git
	cd CMRCLIP

	# Install dependencies
	pip install -r requirements.txt
	```

	```python
	import json
	import torch
	from huggingface_hub import hf_hub_download
	from model.cmrclip import CMRCLIP

	# 1. Download artifacts
	def _download_file(filename):
	return hf_hub_download(
	repo_id="makiyeah/CMRCLIP",
	filename=filename
	)
	config_file = _download_file("config.json")
	weights_file = _download_file("pytorch_model.bin")

	# 2. Load config & model
	with open(config_file, "r") as f:
	cfg = json.load(f)

	model = CMRCLIP(
	video_params=cfg["video_params"],
	text_params=cfg["text_params"],
	projection_dim=cfg.get("projection_dim", 512),
	load_checkpoint=cfg.get("load_checkpoint"),
	projection=cfg.get("projection", "minimal"),
	)
	state_dict = torch.load(weights_file)
	model.load_state_dict(state_dict)
	model.eval()
	```

	---

	## Configuration (`config.json`)

	```json
	{
	"video_params": {
	"model": "SpaceTimeTransformer",
	"arch_config": "base_patch16_224",
	"num_frames": 64,
	"pretrained": true,
	"time_init": "zeros"
	},
	"text_params": {
	"model": "emilyalsentzer/Bio_ClinicalBERT",
	"pretrained": true,
	"input": "text"
	},
	"projection": "minimal",
	"projection_dim": 512,
	"load_checkpoint": ""
	}

	```

	---

	## License

	This model is released under the MIT license. See [LICENSE](LICENSE) for details.


	---

	## Citation

	If you use this model in your work, please cite:

	```bibtex
	@misc{cmrclip2025,
	title={CMR-CLIP: Contrastive Language Image Pretraining for a Cardiac Magnetic Resonance Image Embedding with Zero-shot Capabilities},
	author={Makiya Nakashima, Jielin Qiu, Peide Huang, Po-Hao Chen, Richard Grimm, Christopher Nguyen, Byung-Hak Kim, Ding Zhao, Deborah Kwon, David Chen},
	year={2025},
	}
	```

	---