File size: 2,360 Bytes
29e0c58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca9f586
 
 
29e0c58
ca9f586
29e0c58
 
ca9f586
29e0c58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: apache-2.0
language:
- zh
- en
base_model:
- lingshu-medical-mllm/Lingshu-7B
pipeline_tag: image-text-to-text
metrics:
- bertscore
- bleu
library_name: transformers
tags:
- medical
---


# EchoVLM (paper implementation)

Official PyTorch implementation of the model described in  
**"[EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence](https://arxiv.org/abs/2509.14977)"**.

## 🤖 Model Details

| Item        | Value                                           |
|-------------|-------------------------------------------------|
| Paper       | [arXiv:2509.14977](https://arxiv.org/abs/2509.14977) |
| Authors     | Chaoyin She¹, Ruifang Lu²                        |
| Code        | [GitHub repo](https://github.com/Asunatan/EchoVLM) |
| Model Hub   | [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM) |

## 🔄 Updates
- **Coming soon**: V2 with Chain-of-Thought reasoning and reinforcement learning enhancements—full training & inference code plus benchmark test-set will be fully open-sourced.
- **Dec 1, 2025**: To better promote development in this field, we've open-sourced our latest instruction fine-tuned model based on Lingshu-7B. Essentially built on Qwen2.5VL, it enjoys a better ecosystem—for example, it can seamlessly leverage vLLM for accelerated inference. Released model weights on [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM_V2_lingshu_base_7b_instruct_preview).  
- **Sep 21, 2025**: The full, uncleaned model codebase is now open-sourced on GitHub!
- **Sep 19, 2025**: Released model weights on [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM).  
- **Sep 17, 2025**: Paper published on [arXiv](https://arxiv.org/abs/2509.14977).

## 🚀 Quick Start
Reference [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)

## 📌 Citation

If you use this model or code in your research, please cite:

```bibtex
@misc{she2025echovlmdynamicmixtureofexpertsvisionlanguage,
      title={EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence}, 
      author={Chaoyin She and Ruifang Lu and Lida Chen and Wei Wang and Qinghua Huang},
      year={2025},
      eprint={2509.14977},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.14977}, 
}
```