Improve model card: Add metadata and key links for Fun-ASR-Nano-2512
Browse filesThis PR enhances the model card for the `FunAudioLLM/Fun-ASR-Nano-2512` model by:
- Adding `pipeline_tag: automatic-speech-recognition` to enable discoverability and a direct inference widget.
- Adding `library_name: funasr` to reflect the model's compatibility with the `funasr` library, as evidenced by the usage examples.
- Adding `license: apache-2.0` as a suitable open-source license.
- Updating the main title to `# Fun-ASR-Nano-2512` for better specificity.
- Integrating the paper link [Fun-ASR Technical Report](https://huggingface.co/papers/2509.12508) into the introductory paragraph.
- Adding prominent links to the Project Homepage (`https://funaudiollm.github.io/funasr`) and the Code Repository (`https://github.com/FunAudioLLM/Fun-ASR`) at the top of the README.
- Updating the image source to a direct Hugging Face asset URL for improved display robustness.
The existing comprehensive usage instructions and performance evaluations have been preserved.
Please review and merge if these improvements are satisfactory.
|
@@ -1,11 +1,20 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
ใ[็ฎไฝไธญๆ](README_zh.md)ใ|ใEnglishใ
|
| 4 |
|
| 5 |
-
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab. It is trained on tens of millions of hours of real speech data, possessing powerful contextual understanding capabilities and industry adaptability. It supports low-latency real-time transcription and covers 31 languages. It excels in vertical domains such as education and finance, accurately recognizing professional terminology and industry expressions, effectively addressing challenges like "hallucination" generation and language confusion, achieving "clear hearing, understanding meaning, and accurate writing."
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
<div align="center">
|
| 8 |
-
<img src="images/funasr-v2.png">
|
| 9 |
</div>
|
| 10 |
|
| 11 |
<div align="center">
|
|
@@ -35,7 +44,7 @@ Online Experience:
|
|
| 35 |
# What's New ๐ฅ
|
| 36 |
|
| 37 |
- 2025/12: [Fun-ASR-Nano-2512](https://modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) is an end-to-end speech recognition large model trained on tens of millions of hours real speech data. It supports low-latency real-time transcription and covers 31 languages.
|
| 38 |
-
- 2024/7: [FunASR](https://github.com/
|
| 39 |
|
| 40 |
# Core Features ๐ฏ
|
| 41 |
|
|
@@ -158,7 +167,7 @@ We evaluated Fun-ASR against other state-of-the-art models on open-source benchm
|
|
| 158 |
| **Model Size** | 1.5B | 1.5B | 1.6B | - | - | - | - | 1.1B | 0.8B | 7.7B |
|
| 159 |
| **OpenSource** | โ
| โ
| โ
| โ | โ | โ
| โ
| โ
| โ
| โ |
|
| 160 |
| AIShell1 | 1.81 | 2.17 | 4.72 | 0.68 | 1.63 | 0.71 | 0.63 | 0.54 | 1.80 | 1.22 |
|
| 161 |
-
| AIShell2 | - | 3.47 | 4.68 | 2.27 | 2.76 | 2.86
|
| 162 |
| Fleurs-zh | - | 3.65 | 5.18 | 3.43 | 3.23 | 3.11 | 2.68 | 4.81 | 2.56 | 2.53 |
|
| 163 |
| Fleurs-en | 5.78 | 6.95 | 6.23 | 9.39 | 9.39 | 6.99 | 3.03 | 10.79 | 5.96 | 4.74 |
|
| 164 |
| Librispeech-clean | 2.00 | 2.17 | 1.86 | 1.58 | 2.8 | 1.32 | 1.17 | 1.84 | 1.76 | 1.51 |
|
|
@@ -186,7 +195,7 @@ We evaluated Fun-ASR against other state-of-the-art models on open-source benchm
|
|
| 186 |
| **Average** | **26.13** | **33.39** | **15.95** | **22.63** | **31.00** | **23.49** | **16.72** | **12.70** |
|
| 187 |
|
| 188 |
<div align="center">
|
| 189 |
-
<img src="images/compare_en.png" width="800" />
|
| 190 |
</div>
|
| 191 |
|
| 192 |
## Citations
|
|
@@ -198,4 +207,4 @@ We evaluated Fun-ASR against other state-of-the-art models on open-source benchm
|
|
| 198 |
journal={arXiv preprint arXiv:2509.12508},
|
| 199 |
year={2025}
|
| 200 |
}
|
| 201 |
-
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: automatic-speech-recognition
|
| 4 |
+
library_name: funasr
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Fun-ASR-Nano-2512
|
| 8 |
|
| 9 |
ใ[็ฎไฝไธญๆ](README_zh.md)ใ|ใEnglishใ
|
| 10 |
|
| 11 |
+
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab, as described in the paper [Fun-ASR Technical Report](https://huggingface.co/papers/2509.12508). It is trained on tens of millions of hours of real speech data, possessing powerful contextual understanding capabilities and industry adaptability. It supports low-latency real-time transcription and covers 31 languages. It excels in vertical domains such as education and finance, accurately recognizing professional terminology and industry expressions, effectively addressing challenges like "hallucination" generation and language confusion, achieving "clear hearing, understanding meaning, and accurate writing."
|
| 12 |
+
|
| 13 |
+
Project Homepage: https://funaudiollm.github.io/funasr
|
| 14 |
+
Code Repository: https://github.com/FunAudioLLM/Fun-ASR
|
| 15 |
|
| 16 |
<div align="center">
|
| 17 |
+
<img src="https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512/resolve/main/images/funasr-v2.png">
|
| 18 |
</div>
|
| 19 |
|
| 20 |
<div align="center">
|
|
|
|
| 44 |
# What's New ๐ฅ
|
| 45 |
|
| 46 |
- 2025/12: [Fun-ASR-Nano-2512](https://modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) is an end-to-end speech recognition large model trained on tens of millions of hours real speech data. It supports low-latency real-time transcription and covers 31 languages.
|
| 47 |
+
- 2024/7: [FunASR](https://github.com/FunAudioLLM/Fun-ASR) is a fundamental speech recognition toolkit that offers a variety of features, including speech recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization and multi-talker ASR.
|
| 48 |
|
| 49 |
# Core Features ๐ฏ
|
| 50 |
|
|
|
|
| 167 |
| **Model Size** | 1.5B | 1.5B | 1.6B | - | - | - | - | 1.1B | 0.8B | 7.7B |
|
| 168 |
| **OpenSource** | โ
| โ
| โ
| โ | โ | โ
| โ
| โ
| โ
| โ |
|
| 169 |
| AIShell1 | 1.81 | 2.17 | 4.72 | 0.68 | 1.63 | 0.71 | 0.63 | 0.54 | 1.80 | 1.22 |
|
| 170 |
+
| AIShell2 | - | 3.47 | 4.68 | 2.27 | 2.76 | 2.86 | 2.10 | 2.58 | 2.75 | 2.39 |
|
| 171 |
| Fleurs-zh | - | 3.65 | 5.18 | 3.43 | 3.23 | 3.11 | 2.68 | 4.81 | 2.56 | 2.53 |
|
| 172 |
| Fleurs-en | 5.78 | 6.95 | 6.23 | 9.39 | 9.39 | 6.99 | 3.03 | 10.79 | 5.96 | 4.74 |
|
| 173 |
| Librispeech-clean | 2.00 | 2.17 | 1.86 | 1.58 | 2.8 | 1.32 | 1.17 | 1.84 | 1.76 | 1.51 |
|
|
|
|
| 195 |
| **Average** | **26.13** | **33.39** | **15.95** | **22.63** | **31.00** | **23.49** | **16.72** | **12.70** |
|
| 196 |
|
| 197 |
<div align="center">
|
| 198 |
+
<img src="https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512/resolve/main/images/compare_en.png" width="800" />
|
| 199 |
</div>
|
| 200 |
|
| 201 |
## Citations
|
|
|
|
| 207 |
journal={arXiv preprint arXiv:2509.12508},
|
| 208 |
year={2025}
|
| 209 |
}
|
| 210 |
+
```
|