FireRedASR2-AED / README.md
nielsr's picture
nielsr HF Staff
Update model card: add pipeline tag, sample usage and update paper link
8930bd9 verified
|
raw
history blame
3.14 kB
metadata
language:
  - en
  - zh
license: apache-2.0
pipeline_tag: automatic-speech-recognition
tags:
  - audio
  - asr

FireRedASR2S
A SOTA Industrial-Grade All-in-One ASR System

[Code] [Paper] [Model] [Blog] [Demo]

FireRedASR2S is a state-of-the-art (SOTA), industrial-grade, all-in-one ASR system presented in the paper FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System. It integrates four modules into a unified pipeline: ASR, Voice Activity Detection (VAD), Spoken Language Identification (LID), and Punctuation Prediction (Punc).

Key Features

  • FireRedASR2: Supports speech and singing transcription for Mandarin, Chinese dialects/accents, English, and code-switching.
  • FireRedVAD: Ultra-lightweight module (0.6M parameters) supporting streaming and multi-label VAD (speech/singing/music).
  • FireRedLID: Supports Spoken Language Identification for 100+ languages and 20+ Chinese dialects.
  • FireRedPunc: BERT-style punctuation prediction for Chinese and English.

Sample Usage

To use the system, first clone the official repository and install the dependencies. Then you can use the following Python API:

from fireredasr2s import FireRedAsr2System, FireRedAsr2SystemConfig

# Initialize the system with default config
asr_system_config = FireRedAsr2SystemConfig() 
asr_system = FireRedAsr2System(asr_system_config)

# Process an audio file (16kHz 16-bit mono PCM)
result = asr_system.process("assets/hello_zh.wav")
print(result['text'])
# Output: 你好世界。

🔥 News

  • [2026.03.12] 🔥 We release FireRedASR2S technical report. See arXiv.
  • [2026.02.25] 🔥 We release FireRedASR2-LLM model weights. 🤗
  • [2026.02.12] 🔥 We release FireRedASR2S (FireRedASR2-AED, FireRedVAD, FireRedLID, and FireRedPunc) with model weights and inference code.

Evaluation

FireRedASR2-LLM achieves 2.89% average CER on 4 public Mandarin benchmarks and 11.55% on 19 public Chinese dialects and accents benchmarks, outperforming competitive baselines including Doubao-ASR, Qwen3-ASR, and Fun-ASR.

Model Mandarin (Avg CER%) Dialects (Avg CER%)
FireRedASR2-LLM 2.89 11.55
FireRedASR2-AED 3.05 11.67
Doubao-ASR 3.69 15.39
Qwen3-ASR 3.76 11.85

Citation

@article{xu2026fireredasr2s,
  title={FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System},
  author={Xu, Kaituo and Jia, Yan and Huang, Kai and Chen, Junjie and Li, Wenpeng and Liu, Kun and Xie, Feng-Long and Tang, Xu and Hu, Yao},
  journal={arXiv preprint arXiv:2603.10420},
  year={2026}
}