MedDr_0401 / README.md
Sunanhe's picture
Update README.md
d9e3c47 verified
---
license: mit
language:
- en
base_model:
- OpenGVLab/InternVL-Chat-V1-2
tags:
- medical
- vision-language
- multimodal
- radiology
- pathology
- dermatology
- retinography
- endoscopy
- healthcare
pipeline_tag: image-text-to-text
---
# MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning
*A generalist foundation model for healthcare capable of handling diverse medical data modalities.*
<img src="https://github.com/sunanhe/MedDr/raw/main/examples/logo.jpg" width="200"/>
[![arXiv](https://img.shields.io/badge/arXiv-2404.15127-b31b1b.svg)](https://arxiv.org/abs/2404.15127)
[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://smart-meddr.github.io/)
[![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/sunanhe/MedDr)
**Authors:** [Sunan He*](https://sunanhe.github.io/), [Yuxiang Nie*](https://jerrrynie.github.io/), [Zhixuan Chen](https://zhi-xuan-chen.github.io/homepage/), [Zhiyuan Cai](https://github.com/Davidczy), Hongmei Wang, [Shu Yang](https://github.com/isyangshu), [Hao Chen**](https://cse.hkust.edu.hk/~jhc/)
(*Equal Contribution, **Corresponding Author)
**Institution:** SMART Lab, Hong Kong University of Science and Technology
---
## Model Summary
MedDr is a large-scale generalist vision-language model for healthcare. It is built upon [InternVL](https://github.com/OpenGVLab/InternVL) and trained using a **diagnosis-guided bootstrapping** strategy that leverages both image and label information to construct high-quality vision-language datasets.
MedDr supports diverse medical imaging modalities:
- 🫁 **Radiology** (X-ray, CT, MRI)
- 🔬 **Pathology**
- 🧴 **Dermatology**
- 👁️ **Retinography**
- 🔭 **Endoscopy**
During inference, MedDr employs a **retrieval-augmented medical diagnosis** strategy to enhance generalization ability.
---
## Capabilities
- Visual Question Answering (VQA) for medical images
- Medical report generation
- Medical image diagnosis across multiple modalities
---
## Usage
### Environment Setup
This model is built on [InternVL](https://github.com/OpenGVLab/InternVL). Please follow the [INSTALLATION.md](https://github.com/OpenGVLab/InternVL/blob/main/INSTALLATION.md) to set up the environment.
### Quick Demo
```python
# Clone the GitHub repository
# git clone https://github.com/sunanhe/MedDr.git
# Edit demo.py and set model_path to your local checkpoint directory
# Then run:
# python3 demo.py
```
See [`demo.py`](https://github.com/sunanhe/MedDr/blob/main/demo.py) in the GitHub repository for a full example.
---
## Citation
If you find MedDr useful in your research, please consider citing:
```bibtex
@article{he2024meddr,
title={MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning},
author={He, Sunan and Nie, Yuxiang and Chen, Zhixuan and Cai, Zhiyuan and Wang, Hongmei and Yang, Shu and Chen, Hao},
journal={arXiv preprint arXiv:2404.15127},
year={2024}
}
```
---
## Acknowledgements
This work builds upon [InternVL](https://github.com/OpenGVLab/InternVL). We thank the InternVL team for their outstanding contributions to the open-source VLM community.