chaoyinshe
/

EchoVLM_V2_lingshu_base_7b_instruct_preview

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

EchoVLM_V2_lingshu_base_7b_instruct_preview / README.md

chaoyinshe's picture

Update README.md

ca9f586 verified about 1 month ago

|

history blame contribute delete

2.36 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	base_model:
	- lingshu-medical-mllm/Lingshu-7B
	pipeline_tag: image-text-to-text
	metrics:
	- bertscore
	- bleu
	library_name: transformers
	tags:
	- medical
	---


	# EchoVLM (paper implementation)

	Official PyTorch implementation of the model described in
	"[EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence](https://arxiv.org/abs/2509.14977)".

	## 🤖 Model Details

	\| Item \| Value \|
	\|-------------\|-------------------------------------------------\|
	\| Paper \| [arXiv:2509.14977](https://arxiv.org/abs/2509.14977) \|
	\| Authors \| Chaoyin She¹, Ruifang Lu² \|
	\| Code \| [GitHub repo](https://github.com/Asunatan/EchoVLM) \|
	\| Model Hub \| [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM) \|

	## 🔄 Updates
	- Coming soon: V2 with Chain-of-Thought reasoning and reinforcement learning enhancements—full training & inference code plus benchmark test-set will be fully open-sourced.
	- Dec 1, 2025: To better promote development in this field, we've open-sourced our latest instruction fine-tuned model based on Lingshu-7B. Essentially built on Qwen2.5VL, it enjoys a better ecosystem—for example, it can seamlessly leverage vLLM for accelerated inference. Released model weights on [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM_V2_lingshu_base_7b_instruct_preview).
	- Sep 21, 2025: The full, uncleaned model codebase is now open-sourced on GitHub!
	- Sep 19, 2025: Released model weights on [Hugging Face](https://huggingface.co/chaoyinshe/EchoVLM).
	- Sep 17, 2025: Paper published on [arXiv](https://arxiv.org/abs/2509.14977).

	## 🚀 Quick Start
	Reference [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)

	## 📌 Citation

	If you use this model or code in your research, please cite:

	```bibtex
	@misc{she2025echovlmdynamicmixtureofexpertsvisionlanguage,
	title={EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence},
	author={Chaoyin She and Ruifang Lu and Lida Chen and Wei Wang and Qinghua Huang},
	year={2025},
	eprint={2509.14977},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2509.14977},
	}
	```