Improve model card: Add metadata and key links for Fun-ASR-Nano-2512

#6
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -1,11 +1,20 @@
1
- # Fun-ASR
 
 
 
 
 
 
2
 
3
  ใ€Œ[็ฎ€ไฝ“ไธญๆ–‡](README_zh.md)ใ€|ใ€ŒEnglishใ€
4
 
5
- Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab. It is trained on tens of millions of hours of real speech data, possessing powerful contextual understanding capabilities and industry adaptability. It supports low-latency real-time transcription and covers 31 languages. It excels in vertical domains such as education and finance, accurately recognizing professional terminology and industry expressions, effectively addressing challenges like "hallucination" generation and language confusion, achieving "clear hearing, understanding meaning, and accurate writing."
 
 
 
6
 
7
  <div align="center">
8
- <img src="images/funasr-v2.png">
9
  </div>
10
 
11
  <div align="center">
@@ -35,7 +44,7 @@ Online Experience:
35
  # What's New ๐Ÿ”ฅ
36
 
37
  - 2025/12: [Fun-ASR-Nano-2512](https://modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) is an end-to-end speech recognition large model trained on tens of millions of hours real speech data. It supports low-latency real-time transcription and covers 31 languages.
38
- - 2024/7: [FunASR](https://github.com/modelscope/FunASR) is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR.
39
 
40
  # Core Features ๐ŸŽฏ
41
 
@@ -158,7 +167,7 @@ We evaluated Fun-ASR against other state-of-the-art models on open-source benchm
158
  | **Model Size** | 1.5B | 1.5B | 1.6B | - | - | - | - | 1.1B | 0.8B | 7.7B |
159
  | **OpenSource** | โœ… | โœ… | โœ… | โŒ | โŒ | โœ… | โœ… | โœ… | โœ… | โŒ |
160
  | AIShell1 | 1.81 | 2.17 | 4.72 | 0.68 | 1.63 | 0.71 | 0.63 | 0.54 | 1.80 | 1.22 |
161
- | AIShell2 | - | 3.47 | 4.68 | 2.27 | 2.76 | 2.86 | 2.10 | 2.58 | 2.75 | 2.39 |
162
  | Fleurs-zh | - | 3.65 | 5.18 | 3.43 | 3.23 | 3.11 | 2.68 | 4.81 | 2.56 | 2.53 |
163
  | Fleurs-en | 5.78 | 6.95 | 6.23 | 9.39 | 9.39 | 6.99 | 3.03 | 10.79 | 5.96 | 4.74 |
164
  | Librispeech-clean | 2.00 | 2.17 | 1.86 | 1.58 | 2.8 | 1.32 | 1.17 | 1.84 | 1.76 | 1.51 |
@@ -186,7 +195,7 @@ We evaluated Fun-ASR against other state-of-the-art models on open-source benchm
186
  | **Average** | **26.13** | **33.39** | **15.95** | **22.63** | **31.00** | **23.49** | **16.72** | **12.70** |
187
 
188
  <div align="center">
189
- <img src="images/compare_en.png" width="800" />
190
  </div>
191
 
192
  ## Citations
@@ -198,4 +207,4 @@ We evaluated Fun-ASR against other state-of-the-art models on open-source benchm
198
  journal={arXiv preprint arXiv:2509.12508},
199
  year={2025}
200
  }
201
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: automatic-speech-recognition
4
+ library_name: funasr
5
+ ---
6
+
7
+ # Fun-ASR-Nano-2512
8
 
9
  ใ€Œ[็ฎ€ไฝ“ไธญๆ–‡](README_zh.md)ใ€|ใ€ŒEnglishใ€
10
 
11
+ Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab, as described in the paper [Fun-ASR Technical Report](https://huggingface.co/papers/2509.12508). It is trained on tens of millions of hours of real speech data, possessing powerful contextual understanding capabilities and industry adaptability. It supports low-latency real-time transcription and covers 31 languages. It excels in vertical domains such as education and finance, accurately recognizing professional terminology and industry expressions, effectively addressing challenges like "hallucination" generation and language confusion, achieving "clear hearing, understanding meaning, and accurate writing."
12
+
13
+ Project Homepage: https://funaudiollm.github.io/funasr
14
+ Code Repository: https://github.com/FunAudioLLM/Fun-ASR
15
 
16
  <div align="center">
17
+ <img src="https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512/resolve/main/images/funasr-v2.png">
18
  </div>
19
 
20
  <div align="center">
 
44
  # What's New ๐Ÿ”ฅ
45
 
46
  - 2025/12: [Fun-ASR-Nano-2512](https://modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) is an end-to-end speech recognition large model trained on tens of millions of hours real speech data. It supports low-latency real-time transcription and covers 31 languages.
47
+ - 2024/7: [FunASR](https://github.com/FunAudioLLM/Fun-ASR) is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR.
48
 
49
  # Core Features ๐ŸŽฏ
50
 
 
167
  | **Model Size** | 1.5B | 1.5B | 1.6B | - | - | - | - | 1.1B | 0.8B | 7.7B |
168
  | **OpenSource** | โœ… | โœ… | โœ… | โŒ | โŒ | โœ… | โœ… | โœ… | โœ… | โŒ |
169
  | AIShell1 | 1.81 | 2.17 | 4.72 | 0.68 | 1.63 | 0.71 | 0.63 | 0.54 | 1.80 | 1.22 |
170
+ | AIShell2 | - | 3.47 | 4.68 | 2.27 | 2.76 | 2.86 | 2.10 | 2.58 | 2.75 | 2.39 |
171
  | Fleurs-zh | - | 3.65 | 5.18 | 3.43 | 3.23 | 3.11 | 2.68 | 4.81 | 2.56 | 2.53 |
172
  | Fleurs-en | 5.78 | 6.95 | 6.23 | 9.39 | 9.39 | 6.99 | 3.03 | 10.79 | 5.96 | 4.74 |
173
  | Librispeech-clean | 2.00 | 2.17 | 1.86 | 1.58 | 2.8 | 1.32 | 1.17 | 1.84 | 1.76 | 1.51 |
 
195
  | **Average** | **26.13** | **33.39** | **15.95** | **22.63** | **31.00** | **23.49** | **16.72** | **12.70** |
196
 
197
  <div align="center">
198
+ <img src="https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512/resolve/main/images/compare_en.png" width="800" />
199
  </div>
200
 
201
  ## Citations
 
207
  journal={arXiv preprint arXiv:2509.12508},
208
  year={2025}
209
  }
210
+ ```