File size: 3,424 Bytes
6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 14fc49d 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b 6651a87 565a00b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | ---
language:
- en
- zh
license: apache-2.0
tags:
- audio
- automatic-speech-recognition
- asr
pipeline_tag: automatic-speech-recognition
---
<div align="center">
<h1>
FireRedASR2S
<br>
A SOTA Industrial-Grade All-in-One ASR System
</h1>
</div>
[[Code]](https://github.com/FireRedTeam/FireRedASR2S)
[[Paper]](https://huggingface.co/papers/2603.10420)
[[Model]](https://huggingface.co/FireRedTeam)
[[Blog]](https://fireredteam.github.io/demos/firered_asr/)
[[Demo]](https://huggingface.co/spaces/FireRedTeam/FireRedASR)
FireRedASR2-LLM is the 8B+ parameter variant of the FireRedASR2 system, designed to achieve state-of-the-art performance and enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model capabilities.
The model was introduced in the paper [FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System](https://huggingface.co/papers/2603.10420).
**Authors**: Kaituo Xu, Yan Jia, Kai Huang, Junjie Chen, Wenpeng Li, Kun Liu, Feng-Long Xie, Xu Tang, Yao Hu.
## 🔥 News
- [2026.03.12] 🔥 We release FireRedASR2S technical report. See [arXiv](https://arxiv.org/abs/2603.10420).
- [2026.03.05] 🚀 [vLLM](https://github.com/vllm-project/vllm/pull/35727) supports FireRedASR2-LLM.
- [2026.02.25] 🔥 We release **FireRedASR2-LLM model weights**. [🤗](https://huggingface.co/FireRedTeam/FireRedASR2-LLM) [🤖](https://www.modelscope.cn/models/xukaituo/FireRedASR2-LLM/)
## Sample Usage
To use this model, please refer to the installation and setup instructions in the [official GitHub repository](https://github.com/FireRedTeam/FireRedASR2S).
```python
from fireredasr2s.fireredasr2 import FireRedAsr2, FireRedAsr2Config
batch_uttid = ["hello_zh", "hello_en"]
batch_wav_path = ["assets/hello_zh.wav", "assets/hello_en.wav"]
# FireRedASR2-LLM Configuration
asr_config = FireRedAsr2Config(
use_gpu=True,
decode_min_len=0,
repetition_penalty=1.0,
llm_length_penalty=0.0,
temperature=1.0
)
# Load the model
model = FireRedAsr2.from_pretrained("llm", "FireRedTeam/FireRedASR2-LLM", asr_config)
# Transcribe
results = model.transcribe(batch_uttid, batch_wav_path)
print(results)
# [{'uttid': 'hello_zh', 'text': '你好世界', 'rtf': '0.0681', 'wav': 'assets/hello_zh.wav'}, {'uttid': 'hello_en', 'text': 'hello speech', 'rtf': '0.0681', 'wav': 'assets/hello_en.wav'}]
```
## Evaluation
FireRedASR2-LLM achieves state-of-the-art accuracy across Mandarin and various Chinese dialects.
| Metric | FireRedASR2-LLM | Doubao-ASR | Qwen3-ASR | Fun-ASR |
|:---:|:---:|:---:|:---:|:---:|
| **Avg CER (Mandarin, 4 sets)** | **2.89** | 3.69 | 3.76 | 4.16 |
| **Avg CER (Dialects, 19 sets)** | **11.55**| 15.39| 11.85| 12.76|
## FAQ
**Q: What audio format is supported?**
16kHz 16-bit mono PCM wav. You can convert files using ffmpeg:
`ffmpeg -i <input_audio_path> -ar 16000 -ac 1 -acodec pcm_s16le -f wav <output_wav_path>`
**Q: What are the input length limitations?**
FireRedASR2-LLM supports audio input up to 40s.
## Citation
```bibtex
@article{xu2026fireredasr2s,
title={FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System},
author={Xu, Kaituo and Jia, Yan and Huang, Kai and Chen, Junjie and Li, Wenpeng and Liu, Kun and Xie, Feng-Long and Tang, Xu and Hu, Yao},
journal={arXiv preprint arXiv:2603.10420},
year={2026}
}
``` |