|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- zh |
|
|
- en |
|
|
base_model: |
|
|
- lingshu-medical-mllm/Lingshu-7B |
|
|
pipeline_tag: image-text-to-text |
|
|
metrics: |
|
|
- bertscore |
|
|
- bleu |
|
|
library_name: transformers |
|
|
tags: |
|
|
- medical |
|
|
--- |
|
|
|
|
|
|
|
|
# EchoVLM (paper implementation) |
|
|
|
|
|
Official PyTorch implementation of the model described in |
|
|
**"[EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence](https://arxiv.org/abs/2509.14977)"**. |
|
|
|
|
|
## 🤖 Model Details |
|
|
|
|
|
| Item | Value | |
|
|
|-------------|-------------------------------------------------| |
|
|
| Paper | [arXiv:2509.14977](https://arxiv.org/abs/2509.14977) | |
|
|
| Authors | Chaoyin She¹, Ruifang Lu² | |
|
|
| Code | [GitHub repo](https://github.com/Asunatan/EchoVLM) | |
|
|
| Model Hub | [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM) | |
|
|
|
|
|
## 🔄 Updates |
|
|
- **Coming soon**: V2 with Chain-of-Thought reasoning and reinforcement learning enhancements—full training & inference code plus benchmark test-set will be fully open-sourced. |
|
|
- **Dec 1, 2025**: To better promote development in this field, we've open-sourced our latest instruction fine-tuned model based on Lingshu-7B. Essentially built on Qwen2.5VL, it enjoys a better ecosystem—for example, it can seamlessly leverage vLLM for accelerated inference. Released model weights on [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM_V2_lingshu_base_7b_instruct_preview). |
|
|
- **Sep 21, 2025**: The full, uncleaned model codebase is now open-sourced on GitHub! |
|
|
- **Sep 19, 2025**: Released model weights on [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM). |
|
|
- **Sep 17, 2025**: Paper published on [arXiv](https://arxiv.org/abs/2509.14977). |
|
|
|
|
|
## 🚀 Quick Start |
|
|
Reference [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) |
|
|
|
|
|
## 📌 Citation |
|
|
|
|
|
If you use this model or code in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{she2025echovlmdynamicmixtureofexpertsvisionlanguage, |
|
|
title={EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence}, |
|
|
author={Chaoyin She and Ruifang Lu and Lida Chen and Wei Wang and Qinghua Huang}, |
|
|
year={2025}, |
|
|
eprint={2509.14977}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2509.14977}, |
|
|
} |
|
|
``` |