Add paper link and pipeline tag metadata
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,15 +1,13 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
language:
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
| 6 |
tags:
|
| 7 |
- voice-activity-detection
|
| 8 |
-
-
|
| 9 |
-
-
|
| 10 |
-
- speech activity detection
|
| 11 |
-
- Audio Event Detection
|
| 12 |
-
- audio event detection
|
| 13 |
- vad
|
| 14 |
- aed
|
| 15 |
- streaming
|
|
@@ -19,7 +17,6 @@ tags:
|
|
| 19 |
- asr
|
| 20 |
---
|
| 21 |
|
| 22 |
-
|
| 23 |
<div align="center">
|
| 24 |
<h1>
|
| 25 |
FireRedVAD: A SOTA Industrial-Grade
|
|
@@ -29,17 +26,19 @@ Voice Activity Detection & Audio Event Detection
|
|
| 29 |
|
| 30 |
</div>
|
| 31 |
|
|
|
|
| 32 |
[[Code]](https://github.com/FireRedTeam/FireRedVAD)
|
| 33 |
[[HuggingFace]](https://huggingface.co/FireRedTeam/FireRedVAD)
|
| 34 |
[[ModelScope]](https://www.modelscope.cn/models/xukaituo/FireRedVAD)
|
| 35 |
|
| 36 |
|
| 37 |
-
FireRedVAD is a state-of-the-art (SOTA) industrial-grade Voice Activity Detection (VAD) and Audio Event Detection (AED) solution.
|
| 38 |
|
| 39 |
FireRedVAD supports non-streaming/streaming VAD and non-streaming AED. It supports speech/singing/music detection in 100+ languages. Non-streaming VAD achieves 97.57% F1 on FLEURS-VAD-102, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD.
|
| 40 |
|
| 41 |
|
| 42 |
## 🔥 News
|
|
|
|
| 43 |
- [2026.03.03] We release FireRedVAD as a standalone repository, along with model weights and inference code.
|
| 44 |
- [2026.02.12] We release [FireRedASR2S](https://github.com/FireRedTeam/FireRedASR2S) (FireRedASR2-AED, FireRedVAD, FireRedLID, and FireRedPunc) with model weights and inference code.
|
| 45 |
|
|
@@ -214,3 +213,13 @@ print(result)
|
|
| 214 |
**Q: What audio format is supported?**
|
| 215 |
|
| 216 |
16kHz 16-bit mono PCM wav. Use ffmpeg to convert other formats: `ffmpeg -i <input_audio_path> -ar 16000 -ac 1 -acodec pcm_s16le -f wav <output_wav_path>`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
+
- en
|
| 4 |
+
- zh
|
| 5 |
+
license: apache-2.0
|
| 6 |
+
pipeline_tag: voice-activity-detection
|
| 7 |
tags:
|
| 8 |
- voice-activity-detection
|
| 9 |
+
- speech-activity-detection
|
| 10 |
+
- audio-event-detection
|
|
|
|
|
|
|
|
|
|
| 11 |
- vad
|
| 12 |
- aed
|
| 13 |
- streaming
|
|
|
|
| 17 |
- asr
|
| 18 |
---
|
| 19 |
|
|
|
|
| 20 |
<div align="center">
|
| 21 |
<h1>
|
| 22 |
FireRedVAD: A SOTA Industrial-Grade
|
|
|
|
| 26 |
|
| 27 |
</div>
|
| 28 |
|
| 29 |
+
[[Paper]](https://huggingface.co/papers/2603.10420)
|
| 30 |
[[Code]](https://github.com/FireRedTeam/FireRedVAD)
|
| 31 |
[[HuggingFace]](https://huggingface.co/FireRedTeam/FireRedVAD)
|
| 32 |
[[ModelScope]](https://www.modelscope.cn/models/xukaituo/FireRedVAD)
|
| 33 |
|
| 34 |
|
| 35 |
+
FireRedVAD is a state-of-the-art (SOTA) industrial-grade Voice Activity Detection (VAD) and Audio Event Detection (AED) solution. It was introduced as part of [FireRedASR2S](https://huggingface.co/papers/2603.10420).
|
| 36 |
|
| 37 |
FireRedVAD supports non-streaming/streaming VAD and non-streaming AED. It supports speech/singing/music detection in 100+ languages. Non-streaming VAD achieves 97.57% F1 on FLEURS-VAD-102, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD.
|
| 38 |
|
| 39 |
|
| 40 |
## 🔥 News
|
| 41 |
+
- [2026.03.12] 🔥 We release FireRedASR2S technical report. See [arXiv](https://arxiv.org/abs/2603.10420).
|
| 42 |
- [2026.03.03] We release FireRedVAD as a standalone repository, along with model weights and inference code.
|
| 43 |
- [2026.02.12] We release [FireRedASR2S](https://github.com/FireRedTeam/FireRedASR2S) (FireRedASR2-AED, FireRedVAD, FireRedLID, and FireRedPunc) with model weights and inference code.
|
| 44 |
|
|
|
|
| 213 |
**Q: What audio format is supported?**
|
| 214 |
|
| 215 |
16kHz 16-bit mono PCM wav. Use ffmpeg to convert other formats: `ffmpeg -i <input_audio_path> -ar 16000 -ac 1 -acodec pcm_s16le -f wav <output_wav_path>`
|
| 216 |
+
|
| 217 |
+
## Citation
|
| 218 |
+
```bibtex
|
| 219 |
+
@article{xu2026fireredasr2s,
|
| 220 |
+
title={FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System},
|
| 221 |
+
author={Xu, Kaituo and Jia, Yan and Huang, Kai and Chen, Junjie and Li, Wenpeng and Liu, Kun and Xie, Feng-Long and Tang, Xu and Hu, Yao},
|
| 222 |
+
journal={arXiv preprint arXiv:2603.10420},
|
| 223 |
+
year={2026}
|
| 224 |
+
}
|
| 225 |
+
```
|