--- datasets: - custom language: - en license: apache-2.0 metrics: - wer - bleu - AIR-Bench pipeline_tag: audio-to-audio tags: - audio-text-to-audio-text - speech-understanding - audio - chat library_name: transformers ---

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

🐈‍⬛ Github | 📃 Paper | 🌐 Project Page | 🚀 Space 

## Model Description EchoX is a Speech-to-Speech large language model that addresses the acoustic-semantic gap. By introducing **Echo Training**, EchoX integrates semantic and acoustic learning, mitigating the degradation of reasoning ability observed in existing speech-based LLMs. It is trained on only 6k hours of data while delivering state-of-the-art results in knowledge-based question answering and speech interaction tasks. ### Key Features
## Sample Usage To set up your environment and run inference, follow these steps from the [GitHub repository](https://github.com/FreedomIntelligence/EchoX): First, clone the repository, set up the environment, and install dependencies: ```bash git clone https://github.com/FreedomIntelligence/EchoX.git cd EchoX conda create -n echox python=3.10 pip=24.0 conda activate echox pip install -r requirements.txt ``` Next, download the models: ```bash pip install -U huggingface_hub hf download --resume-download FreedomIntelligence/EchoX-8B --local-dir EchoX-8B hf download --resume-download openai/whisper-large-v3 --local-dir whisper-large-v3 ``` Finally, run inference on a test case, or start the Gradio web interface: ```bash python demo.py # Alternatively, start the Gradio web interface: # python app.py # To use a specific GPU: # CUDA_VISIBLE_DEVICES=1 python app.py ``` # 📖 Citation ``` @misc{zhang2025echoxmitigatingacousticsemanticgap, title={EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs}, author={Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li}, year={2025}, eprint={2509.09174}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.09174}, } ```