chtan commited on
Commit
d532180
·
verified ·
1 Parent(s): fa4f933

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -13
README.md CHANGED
@@ -1,23 +1,19 @@
1
  ---
2
  language:
3
- - en
4
- - zh
5
  license: apache-2.0
6
  library_name: transformers
7
  tags:
8
- - audio
9
- - speech
10
- - audio-language-model
11
- - speech-to-text
12
- - speech-to-speech
13
- - voice-chat
14
- pipeline_tag: audio-text-to-text
15
  ---
16
 
17
  # Fun-Audio-Chat-8B
18
 
19
  <p align="right">
20
- <a href="README.md">English</a> | <a href="README_zh.md">中文</a>
21
  </p>
22
 
23
  <div align="center">
@@ -36,12 +32,20 @@ pipeline_tag: audio-text-to-text
36
 
37
  Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions. It introduces **Dual-Resolution Speech Representations** (an efficient 5Hz shared backbone + a 25Hz refined head) to cut compute while keeping high speech quality, and **Core-Cocktail training** to preserve strong text LLM capabilities. It delivers top-tier results on spoken QA, audio understanding, speech function calling, speech instruction-following and voice empathy benchmarks.
38
 
 
 
 
 
39
  ### Key Features
40
 
41
  - **Dual-Resolution Speech Representations**: Efficient 5Hz frame rate (vs. 12.5Hz or 25Hz for other models), reducing GPU hours by nearly 50% while maintaining high speech quality
42
  - **State-of-the-Art Performance**: Ranks Top among models of the same size (around-8B parameters) on OpenAudioBench, VoiceBench, UltraEval-Audio, MMAU, MMAU-Pro, MMSU, Speech-ACEBench, Speech-BFCL, Speech-SmartInteract, VStyle
43
  - **Comprehensive Capabilities**: Supports spoken QA, audio understanding, speech function calling, speech instruction-following, voice empathy
44
 
 
 
 
 
45
  ## Model Details
46
 
47
  | Attribute | Value |
@@ -117,10 +121,20 @@ If you find this model useful, please cite our paper:
117
 
118
  ```bibtex
119
  @article{funaudiochat2025,
120
- title={Fun-Audio-Chat: A Large Audio Language Model for Natural Voice Interactions},
121
  author={Tongyi Fun Team},
122
  year={2025}
123
  }
 
 
 
 
 
 
 
 
 
 
124
  ```
125
 
126
  ## License
@@ -139,5 +153,4 @@ This project is based on the following excellent open-source projects:
139
  ## Contact
140
 
141
  - 🐛 Submit an [Issue](https://github.com/FunAudioLLM/Fun-Audio-Chat/issues)
142
- - 💡 Submit a [Pull Request](https://github.com/FunAudioLLM/Fun-Audio-Chat/pulls)
143
-
 
1
  ---
2
  language:
3
+ - en
4
+ - zh
5
  license: apache-2.0
6
  library_name: transformers
7
  tags:
8
+ - audio-language-model
9
+ - speech-to-speech
10
+ pipeline_tag: any-to-any
 
 
 
 
11
  ---
12
 
13
  # Fun-Audio-Chat-8B
14
 
15
  <p align="right">
16
+ <a href="Fun-Audio-Chat-8B/blob/main/README.md">English</a> | <a href="Fun-Audio-Chat-8B/blob/main/README_zh.md">中文</a>
17
  </p>
18
 
19
  <div align="center">
 
32
 
33
  Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions. It introduces **Dual-Resolution Speech Representations** (an efficient 5Hz shared backbone + a 25Hz refined head) to cut compute while keeping high speech quality, and **Core-Cocktail training** to preserve strong text LLM capabilities. It delivers top-tier results on spoken QA, audio understanding, speech function calling, speech instruction-following and voice empathy benchmarks.
34
 
35
+ <p align="center">
36
+ <img width="95%" src="https://github.com/FunAudioLLM/Fun-Audio-Chat/blob/main/assets/Results.png?raw=true">
37
+ </p>
38
+
39
  ### Key Features
40
 
41
  - **Dual-Resolution Speech Representations**: Efficient 5Hz frame rate (vs. 12.5Hz or 25Hz for other models), reducing GPU hours by nearly 50% while maintaining high speech quality
42
  - **State-of-the-Art Performance**: Ranks Top among models of the same size (around-8B parameters) on OpenAudioBench, VoiceBench, UltraEval-Audio, MMAU, MMAU-Pro, MMSU, Speech-ACEBench, Speech-BFCL, Speech-SmartInteract, VStyle
43
  - **Comprehensive Capabilities**: Supports spoken QA, audio understanding, speech function calling, speech instruction-following, voice empathy
44
 
45
+ <p align="center">
46
+ <img width="95%" src="https://github.com/FunAudioLLM/Fun-Audio-Chat/blob/main/assets/Architecture.png?raw=true">
47
+ </p>
48
+
49
  ## Model Details
50
 
51
  | Attribute | Value |
 
121
 
122
  ```bibtex
123
  @article{funaudiochat2025,
124
+ title={Fun-Audio-Chat Technical Report},
125
  author={Tongyi Fun Team},
126
  year={2025}
127
  }
128
+
129
+ @misc{tan2025drvoiceparallelspeechtextvoice,
130
+ title={DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations},
131
+ author={Chao-Hong Tan and Qian Chen and Wen Wang and Chong Deng and Qinglin Zhang and Luyao Cheng and Hai Yu and Xin Zhang and Xiang Lv and Tianyu Zhao and Chong Zhang and Yukun Ma and Yafeng Chen and Hui Wang and Jiaqing Liu and Xiangang Li and Jieping Ye},
132
+ year={2025},
133
+ eprint={2506.09349},
134
+ archivePrefix={arXiv},
135
+ primaryClass={cs.CL},
136
+ url={https://arxiv.org/abs/2506.09349},
137
+ }
138
  ```
139
 
140
  ## License
 
153
  ## Contact
154
 
155
  - 🐛 Submit an [Issue](https://github.com/FunAudioLLM/Fun-Audio-Chat/issues)
156
+ - 💡 Submit a [Pull Request](https://github.com/FunAudioLLM/Fun-Audio-Chat/pulls)