makiyeah
/

CMRCLIP

Feature Extraction

contrastive-learning

vision-transformer

Model card Files Files and versions

makiyeah commited on Jun 18, 2025

Commit

80c54bf

·

verified ·

1 Parent(s): c2641f1

Upload README.md

Files changed (1) hide show

README.md +111 -3

README.md CHANGED Viewed

@@ -1,3 +1,111 @@
----
-license: mit
----

+# CMRCLIP
+> A CMR-report contrastive model combining Vision Transformers and pretrained text encoders.
+---
+## Model Overview
+**CMRCLIP** encodes CMR images and clinical reports into a shared embedding space for retrieval, similarity scoring, and downstream tasks. It uses:
+* A pretrained text encoder (`Bio+ClinicalBERT`)
+* A video encoder built on Vision Transformers (`SpaceTimeTransformer`)
+* A lightweight projection head to map both modalities into a common vector space
+This repository contains only the trained weights and minimal configuration needed to load and run the model.
+---
+## Files
+* `config.json` — Model hyperparameters & architecture settings
+* `pytorch_model.bin` — Saved PyTorch `state_dict` of the trained model
+---
+## Usage Example
+Below is a minimal example of how to download and load the model using the Hugging Face Hub:
+```python
+import json
+import torch
+from huggingface_hub import hf_hub_download
+from model.cmrclip import CMRCLIP
+# 1. Download artifacts
+def _download_file(filename):
+    return hf_hub_download(
+        repo_id="makiyeah/CMRCLIP",
+        filename=filename
+    )
+config_file = _download_file("config.json")
+weights_file = _download_file("pytorch_model.bin")
+# 2. Load config & model
+with open(config_file, "r") as f:
+    cfg = json.load(f)
+model = CMRCLIP(
+    video_params=cfg["video_params"],
+    text_params=cfg["text_params"],
+    projection_dim=cfg.get("projection_dim", 512),
+    load_checkpoint=cfg.get("load_checkpoint"),
+    projection=cfg.get("projection", "minimal"),
+)
+state_dict = torch.load(weights_file)
+model.load_state_dict(state_dict)
+model.eval()
+```
+---
+## Configuration (`config.json`)
+```json
+{
+  "arch": {
+      "type": "CMRCLIP",
+      "args": {
+          "video_params": {
+              "model": "SpaceTimeTransformer",
+              "arch_config": "base_patch16_224",
+              "num_frames": 64,
+              "pretrained": true,
+              "time_init": "zeros"
+          },
+          "text_params": {
+              "model": "emilyalsentzer/Bio_ClinicalBERT",
+              "pretrained": true,
+              "input": "text"
+          },
+          "projection": "minimal",
+          "projection_dim": 512,
+          "load_checkpoint": ""
+        }
+    }
+}
+```
+---
+## License
+This model is released under the **MIT** license. See [LICENSE](LICENSE) for details.
+---
+## Citation
+If you use this model in your work, please cite:
+```bibtex
+@misc{cmrclip2025,
+  title={CMR-CLIP: Contrastive Language Image Pretraining for a Cardiac Magnetic Resonance Image Embedding with Zero-shot Capabilities},
+  year={2025},
+}
+```
+---