README.md · Sunanhe/MedDr

MedDr_0401 / README.md

Sunanhe

Update README.md

d9e3c47 verified 10 days ago

preview code

raw

history blame contribute delete

3.21 kB

	---
	license: mit
	language:
	- en
	base_model:
	- OpenGVLab/InternVL-Chat-V1-2
	tags:
	- medical
	- vision-language
	- multimodal
	- radiology
	- pathology
	- dermatology
	- retinography
	- endoscopy
	- healthcare
	pipeline_tag: image-text-to-text
	---

	# MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

	A generalist foundation model for healthcare capable of handling diverse medical data modalities.

	<img src="https://github.com/sunanhe/MedDr/raw/main/examples/logo.jpg" width="200"/>

	[![arXiv](https://img.shields.io/badge/arXiv-2404.15127-b31b1b.svg)](https://arxiv.org/abs/2404.15127)
	[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://smart-meddr.github.io/)
	[![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/sunanhe/MedDr)

	Authors: [Sunan He](https://sunanhe.github.io/), [Yuxiang Nie](https://jerrrynie.github.io/), [Zhixuan Chen](https://zhi-xuan-chen.github.io/homepage/), [Zhiyuan Cai](https://github.com/Davidczy), Hongmei Wang, [Shu Yang](https://github.com/isyangshu), [Hao Chen**](https://cse.hkust.edu.hk/~jhc/)
	(Equal Contribution, *Corresponding Author)
	Institution: SMART Lab, Hong Kong University of Science and Technology

	---

	## Model Summary

	MedDr is a large-scale generalist vision-language model for healthcare. It is built upon [InternVL](https://github.com/OpenGVLab/InternVL) and trained using a diagnosis-guided bootstrapping strategy that leverages both image and label information to construct high-quality vision-language datasets.

	MedDr supports diverse medical imaging modalities:
	- 🫁 Radiology (X-ray, CT, MRI)
	- 🔬 Pathology
	- 🧴 Dermatology
	- 👁️ Retinography
	- 🔭 Endoscopy

	During inference, MedDr employs a retrieval-augmented medical diagnosis strategy to enhance generalization ability.

	---

	## Capabilities

	- Visual Question Answering (VQA) for medical images
	- Medical report generation
	- Medical image diagnosis across multiple modalities

	---

	## Usage

	### Environment Setup

	This model is built on [InternVL](https://github.com/OpenGVLab/InternVL). Please follow the [INSTALLATION.md](https://github.com/OpenGVLab/InternVL/blob/main/INSTALLATION.md) to set up the environment.

	### Quick Demo

	```python
	# Clone the GitHub repository
	# git clone https://github.com/sunanhe/MedDr.git

	# Edit demo.py and set model_path to your local checkpoint directory
	# Then run:
	# python3 demo.py
	```

	See [`demo.py`](https://github.com/sunanhe/MedDr/blob/main/demo.py) in the GitHub repository for a full example.

	---

	## Citation

	If you find MedDr useful in your research, please consider citing:

	```bibtex
	@article{he2024meddr,
	title={MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning},
	author={He, Sunan and Nie, Yuxiang and Chen, Zhixuan and Cai, Zhiyuan and Wang, Hongmei and Yang, Shu and Chen, Hao},
	journal={arXiv preprint arXiv:2404.15127},
	year={2024}
	}
	```

	---

	## Acknowledgements

	This work builds upon [InternVL](https://github.com/OpenGVLab/InternVL). We thank the InternVL team for their outstanding contributions to the open-source VLM community.