| --- |
| license: mit |
| language: |
| - en |
| base_model: |
| - OpenGVLab/InternVL-Chat-V1-2 |
| tags: |
| - medical |
| - vision-language |
| - multimodal |
| - radiology |
| - pathology |
| - dermatology |
| - retinography |
| - endoscopy |
| - healthcare |
| pipeline_tag: image-text-to-text |
| --- |
| |
| # MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning |
|
|
| *A generalist foundation model for healthcare capable of handling diverse medical data modalities.* |
|
|
| <img src="https://github.com/sunanhe/MedDr/raw/main/examples/logo.jpg" width="200"/> |
|
|
| [](https://arxiv.org/abs/2404.15127) |
| [](https://smart-meddr.github.io/) |
| [](https://github.com/sunanhe/MedDr) |
|
|
| **Authors:** [Sunan He*](https://sunanhe.github.io/), [Yuxiang Nie*](https://jerrrynie.github.io/), [Zhixuan Chen](https://zhi-xuan-chen.github.io/homepage/), [Zhiyuan Cai](https://github.com/Davidczy), Hongmei Wang, [Shu Yang](https://github.com/isyangshu), [Hao Chen**](https://cse.hkust.edu.hk/~jhc/) |
| (*Equal Contribution, **Corresponding Author) |
| **Institution:** SMART Lab, Hong Kong University of Science and Technology |
|
|
| --- |
|
|
| ## Model Summary |
|
|
| MedDr is a large-scale generalist vision-language model for healthcare. It is built upon [InternVL](https://github.com/OpenGVLab/InternVL) and trained using a **diagnosis-guided bootstrapping** strategy that leverages both image and label information to construct high-quality vision-language datasets. |
|
|
| MedDr supports diverse medical imaging modalities: |
| - 🫁 **Radiology** (X-ray, CT, MRI) |
| - 🔬 **Pathology** |
| - 🧴 **Dermatology** |
| - 👁️ **Retinography** |
| - 🔭 **Endoscopy** |
|
|
| During inference, MedDr employs a **retrieval-augmented medical diagnosis** strategy to enhance generalization ability. |
|
|
| --- |
|
|
| ## Capabilities |
|
|
| - Visual Question Answering (VQA) for medical images |
| - Medical report generation |
| - Medical image diagnosis across multiple modalities |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Environment Setup |
|
|
| This model is built on [InternVL](https://github.com/OpenGVLab/InternVL). Please follow the [INSTALLATION.md](https://github.com/OpenGVLab/InternVL/blob/main/INSTALLATION.md) to set up the environment. |
|
|
| ### Quick Demo |
|
|
| ```python |
| # Clone the GitHub repository |
| # git clone https://github.com/sunanhe/MedDr.git |
| |
| # Edit demo.py and set model_path to your local checkpoint directory |
| # Then run: |
| # python3 demo.py |
| ``` |
|
|
| See [`demo.py`](https://github.com/sunanhe/MedDr/blob/main/demo.py) in the GitHub repository for a full example. |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you find MedDr useful in your research, please consider citing: |
|
|
| ```bibtex |
| @article{he2024meddr, |
| title={MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning}, |
| author={He, Sunan and Nie, Yuxiang and Chen, Zhixuan and Cai, Zhiyuan and Wang, Hongmei and Yang, Shu and Chen, Hao}, |
| journal={arXiv preprint arXiv:2404.15127}, |
| year={2024} |
| } |
| ``` |
|
|
| --- |
|
|
| ## Acknowledgements |
|
|
| This work builds upon [InternVL](https://github.com/OpenGVLab/InternVL). We thank the InternVL team for their outstanding contributions to the open-source VLM community. |
|
|
|
|