nielsr HF Staff commited on
Commit
5de3136
·
verified ·
1 Parent(s): fdc175b

Improve model card: Add pipeline tag, library name, project page link, and sample usage

Browse files

This PR improves the model card by:

- Adding the `pipeline_tag: audio-to-audio` to the metadata, enabling easier discovery on the Hugging Face Hub at https://huggingface.co/models?pipeline_tag=audio-to-audio.
- Including `library_name: transformers` in the metadata, as evidence from `config.json` and the model's architecture suggests compatibility with the Hugging Face `transformers` library, which will enable the automated "how to use" widget.
- Adding an explicit link to the project page (`https://freedomintelligence.github.io/EchoX/`) in the top section of the model card.
- Replacing the generic "Usage" section with a comprehensive "Sample Usage" section, including environment setup, model download, and inference commands directly from the official GitHub repository's README, allowing users to quickly get started.
- Correcting the training data size in the "Model Description" from 10k hours to 6k hours, aligning with the paper's abstract and the GitHub README.

These changes provide more comprehensive information and improve user experience on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +40 -12
README.md CHANGED
@@ -1,19 +1,22 @@
1
  ---
 
 
2
  language:
3
  - en
4
- tags:
5
- - audio-text-to-audio-text
6
- - speech-understanding
7
- - audio
8
- - chat
9
  license: apache-2.0
10
- datasets:
11
- - custom
12
  metrics:
13
  - wer
14
  - bleu
15
  - AIR-Bench
 
 
 
 
 
 
 
16
  ---
 
17
  <div align="center">
18
  <h1>
19
  EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
@@ -21,25 +24,50 @@ metrics:
21
  </div>
22
 
23
  <p align="center">
24
- <font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ Github</a>&nbsp|&nbsp<a href="https://arxiv.org/abs/2509.09174">📃 Paper</a>&nbsp|&nbsp<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 Space</a>&nbsp</font>
25
  </p>
26
 
27
  ## Model Description
28
- EchoX is a Speech-to-Speech large language model that addresses the acoustic-semantic gap. By introducing **Echo Training**, EchoX integrates semantic and acoustic learning, mitigating the degradation of reasoning ability observed in existing speech-based LLMs. It is trained on only 10k hours of data while delivering state-of-the-art results in knowledge-based question answering and speech interaction tasks.
29
 
30
  ### Key Features
31
  <div>
32
  <ul>
33
  <font size="3"><li>Mitigates Acoustic-Semantic Gap in Speech-to-Speech LLMs</li></font>
34
  <font size="3"><li>Introduces Echo Training with a Novel Three-Stage Pipeline (S2T, T2C, Echo)</li></font>
35
- <font size="3"><li>Trained on Only 10k Hours of Curated Data, Ensuring Efficiency</li></font>
36
  <font size="3"><li>Achieves State-of-the-Art Performance in Knowledge-Based QA Benchmarks</li></font>
37
  <font size="3"><li>Preserves Reasoning and Knowledge Abilities for Interactive Speech Tasks</li></font>
38
  </ul>
39
  </div>
40
 
41
- ## Usage
42
- Load the EchoX model and run inference with your audio files as shown in the <a href="https://github.com/FreedomIntelligence/EchoX">GitHub repository</a>.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  # <span>📖 Citation</span>
45
  ```
 
1
  ---
2
+ datasets:
3
+ - custom
4
  language:
5
  - en
 
 
 
 
 
6
  license: apache-2.0
 
 
7
  metrics:
8
  - wer
9
  - bleu
10
  - AIR-Bench
11
+ pipeline_tag: audio-to-audio
12
+ tags:
13
+ - audio-text-to-audio-text
14
+ - speech-understanding
15
+ - audio
16
+ - chat
17
+ library_name: transformers
18
  ---
19
+
20
  <div align="center">
21
  <h1>
22
  EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
 
24
  </div>
25
 
26
  <p align="center">
27
+ <font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ Github</a>&nbsp|&nbsp<a href="https://arxiv.org/abs/2509.09174">📃 Paper</a>&nbsp|&nbsp<a href="https://freedomintelligence.github.io/EchoX">🌐 Project Page</a>&nbsp|&nbsp<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">🚀 Space</a>&nbsp</font>
28
  </p>
29
 
30
  ## Model Description
31
+ EchoX is a Speech-to-Speech large language model that addresses the acoustic-semantic gap. By introducing **Echo Training**, EchoX integrates semantic and acoustic learning, mitigating the degradation of reasoning ability observed in existing speech-based LLMs. It is trained on only 6k hours of data while delivering state-of-the-art results in knowledge-based question answering and speech interaction tasks.
32
 
33
  ### Key Features
34
  <div>
35
  <ul>
36
  <font size="3"><li>Mitigates Acoustic-Semantic Gap in Speech-to-Speech LLMs</li></font>
37
  <font size="3"><li>Introduces Echo Training with a Novel Three-Stage Pipeline (S2T, T2C, Echo)</li></font>
38
+ <font size="3"><li>Trained on Only 6k Hours of Curated Data, Ensuring Efficiency</li></font>
39
  <font size="3"><li>Achieves State-of-the-Art Performance in Knowledge-Based QA Benchmarks</li></font>
40
  <font size="3"><li>Preserves Reasoning and Knowledge Abilities for Interactive Speech Tasks</li></font>
41
  </ul>
42
  </div>
43
 
44
+ ## Sample Usage
45
+ To set up your environment and run inference, follow these steps from the [GitHub repository](https://github.com/FreedomIntelligence/EchoX):
46
+
47
+ First, clone the repository, set up the environment, and install dependencies:
48
+ ```bash
49
+ git clone https://github.com/FreedomIntelligence/EchoX.git
50
+ cd EchoX
51
+ conda create -n echox python=3.10 pip=24.0
52
+ conda activate echox
53
+ pip install -r requirements.txt
54
+ ```
55
+
56
+ Next, download the models:
57
+ ```bash
58
+ pip install -U huggingface_hub
59
+ hf download --resume-download FreedomIntelligence/EchoX-8B --local-dir EchoX-8B
60
+ hf download --resume-download openai/whisper-large-v3 --local-dir whisper-large-v3
61
+ ```
62
+
63
+ Finally, run inference on a test case, or start the Gradio web interface:
64
+ ```bash
65
+ python demo.py
66
+ # Alternatively, start the Gradio web interface:
67
+ # python app.py
68
+ # To use a specific GPU:
69
+ # CUDA_VISIBLE_DEVICES=1 python app.py
70
+ ```
71
 
72
  # <span>📖 Citation</span>
73
  ```