metadata
license: mit
Merlin: Vision Language Foundation Model for 3D Computed Tomography
Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining.
β‘οΈ Installation
To install Merlin, you can simply run:
pip install merlin-vlm
For an editable installation, use the following commands to clone and install this repository.
git clone https://github.com/StanfordMIMI/Merlin.git
cd merlin
pip install -e .
For usage instructions, please visit the github repository.
π Project Structure:
.
βββ README.md
βββ i3_resnet_clinical_longformer_best_clip_04-02-2024_23-21-36_epoch_99.pt <Merlin weights>
βββ image1.nii.gz <Sample Image>
π Citation
If you find this repository useful for your work, please cite the cite the original paper:
@article{blankemeier2024merlin,
title={Merlin: A vision language foundation model for 3d computed tomography},
author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others},
journal={Research Square},
pages={rs--3},
year={2024}
}