FireRedTeam
/

FireRedVAD

@@ -1,15 +1,13 @@
 ---
-license: apache-2.0
 language:
-  - en
-  - zh
 tags:
 - voice-activity-detection
-- Voice Acticity Detection
-- voice activity detection
-- speech activity detection
-- Audio Event Detection
-- audio event detection
 - vad
 - aed
 - streaming
@@ -19,7 +17,6 @@ tags:
 - asr
 ---
 <div align="center">
 <h1>
 FireRedVAD: A SOTA Industrial-Grade
@@ -29,17 +26,19 @@ Voice Activity Detection & Audio Event Detection
 </div>
 [[Code]](https://github.com/FireRedTeam/FireRedVAD)
 [[HuggingFace]](https://huggingface.co/FireRedTeam/FireRedVAD)
 [[ModelScope]](https://www.modelscope.cn/models/xukaituo/FireRedVAD)
-FireRedVAD is a state-of-the-art (SOTA) industrial-grade Voice Activity Detection (VAD) and Audio Event Detection (AED) solution.
 FireRedVAD supports non-streaming/streaming VAD and non-streaming AED. It supports speech/singing/music detection in 100+ languages. Non-streaming VAD achieves 97.57% F1 on FLEURS-VAD-102, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD.
 ## 🔥 News
 - [2026.03.03] We release FireRedVAD as a standalone repository, along with model weights and inference code.
 - [2026.02.12] We release [FireRedASR2S](https://github.com/FireRedTeam/FireRedASR2S) (FireRedASR2-AED, FireRedVAD, FireRedLID, and FireRedPunc) with model weights and inference code.
@@ -214,3 +213,13 @@ print(result)
 **Q: What audio format is supported?**
 16kHz 16-bit mono PCM wav. Use ffmpeg to convert other formats: `ffmpeg -i <input_audio_path> -ar 16000 -ac 1 -acodec pcm_s16le -f wav <output_wav_path>`

 ---
 language:
+- en
+- zh
+license: apache-2.0
+pipeline_tag: voice-activity-detection
 tags:
 - voice-activity-detection
+- speech-activity-detection
+- audio-event-detection
 - vad
 - aed
 - streaming
 - asr
 ---
 <div align="center">
 <h1>
 FireRedVAD: A SOTA Industrial-Grade
 </div>
+[[Paper]](https://huggingface.co/papers/2603.10420)
 [[Code]](https://github.com/FireRedTeam/FireRedVAD)
 [[HuggingFace]](https://huggingface.co/FireRedTeam/FireRedVAD)
 [[ModelScope]](https://www.modelscope.cn/models/xukaituo/FireRedVAD)
+FireRedVAD is a state-of-the-art (SOTA) industrial-grade Voice Activity Detection (VAD) and Audio Event Detection (AED) solution. It was introduced as part of [FireRedASR2S](https://huggingface.co/papers/2603.10420).
 FireRedVAD supports non-streaming/streaming VAD and non-streaming AED. It supports speech/singing/music detection in 100+ languages. Non-streaming VAD achieves 97.57% F1 on FLEURS-VAD-102, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD.
 ## 🔥 News
+- [2026.03.12] 🔥 We release FireRedASR2S technical report. See [arXiv](https://arxiv.org/abs/2603.10420).
 - [2026.03.03] We release FireRedVAD as a standalone repository, along with model weights and inference code.
 - [2026.02.12] We release [FireRedASR2S](https://github.com/FireRedTeam/FireRedASR2S) (FireRedASR2-AED, FireRedVAD, FireRedLID, and FireRedPunc) with model weights and inference code.
 **Q: What audio format is supported?**
 16kHz 16-bit mono PCM wav. Use ffmpeg to convert other formats: `ffmpeg -i <input_audio_path> -ar 16000 -ac 1 -acodec pcm_s16le -f wav <output_wav_path>`
+## Citation
+```bibtex
+@article{xu2026fireredasr2s,
+  title={FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System},
+  author={Xu, Kaituo and Jia, Yan and Huang, Kai and Chen, Junjie and Li, Wenpeng and Liu, Kun and Xie, Feng-Long and Tang, Xu and Hu, Yao},
+  journal={arXiv preprint arXiv:2603.10420},
+  year={2026}
+}
+```