Whisper / README_EN.md
inoryQwQ's picture
Update README_EN.md
c453403 verified
---
license: bsd-3-clause
pipeline_tag: automatic-speech-recognition
---
# whisper.axera
- [English](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README_EN.md)
- [中文](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README.md)
OpenAI Whisper on Axera Platform
## Overview
This project provides an optimized implementation of OpenAI's Whisper speech recognition model for Axera AI processors (AX650N/AX630C). It supports both C++ and Python interfaces for efficient on-device speech-to-text conversion.
## Features
- **Dual Language Support**: Both C++ and Python APIs available
- **Multiple Model Sizes**: Support for tiny, base, small, and turbo model variants
- **Multi-language Recognition**: Tested with English, Chinese, Japanese, and Korean
- **Optimized Performance**: Specially optimized for Axera NPU acceleration
- **Easy Deployment**: Pre-built packages and cross-compilation support
## Update
- 2026/01/14: We provide cleaner model architecture now.(With encoder and decoder instead of decoder_main and decoder_loop). Support exporting models from huggingface.
## Supported Platforms
- ✅ AX650N
- ✅ AX630C
## Pre-trained Models
Download pre-compiled models from:
- [Baidu Cloud](https://pan.baidu.com/s/1tOHVMZCin0A68T5HmKRJyg?pwd=axyz)
- [Huggingface](https://huggingface.co/AXERA-TECH/Whisper)
For custom model conversion, please refer to [Model Conversion Guide](./model_convert/README_EN.md).
## Model Conversion
Currently supported model scales:
- tiny
- base
- small
- medium
- turbo
Tested languages:
- English
- Chinese
- Japanese
- Korean
- Malaysian
For other languages or custom model sizes, please refer to the [Model Conversion Guide](./model_convert/README_EN.md).
## Deployment on Target Devices
### Prerequisites
- AX650N/AX630C devices with Ubuntu 22.04 pre-installed
- Internet connection for `apt install` and `pip install`
- Verified hardware platforms:
- [MaixIV M4nDock (AX650N)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator Card (AX650N)](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- [Axera Pi 2 (AX630C)](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
- [Module-LLM (AX630C)](https://docs.m5stack.com/zh_CN/module/Module-LLM)
- [LLM630 Compute Kit (AX630C)](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
## Programming Language Support
### Python
Tested with Python 3.12. We recommend using [Miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh) for environment management.
#### Installation
```bash
cd python
pip3 install -r requirements.txt
```
#### pyaxenigne
Install NPU Python API from: https://github.com/AXERA-TECH/pyaxengine
#### Usage
##### Command Line Interface
```
cd python
(whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_cli.py -w ../demo.wav -t tiny
[INFO] Available providers: ['AxEngineExecutionProvider']
{'wav': '../demo.wav', 'model_type': 'tiny', 'model_path': '../models-ax650', 'language': 'zh', 'task': 'transcribe'}
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
ASR result:
擅职出现交易几乎停止的情况
RTF: 0.10313174677896837
```
Command line arguments:
| Argument | Description | Default |
| --- | --- | --- |
| --wav | Input audio file | - |
| --model_type/-t | Model type: tiny/base/small | - |
| --model_path/-p | Model directory | ../models |
| --language/-l | Recognition language | zh |
##### Server Mode
```
(whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_svr.py
[INFO] Available providers: ['AxEngineExecutionProvider']
Server started at http://0.0.0.0:8000
```
Test the server:
```
python test_svr.py
```
<h3 id="CPP">CPP</h3>
#### Usage on Target Device
```
cd cpp/ax650
./whisper_cli -w ../demo.wav -t tiny
```
```
cd cpp/ax650
./whisper_cli --model_type small -w ../demo.wav
```
Example Output:
```
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_cli -w ../../demo.wav -t tiny
wav_file: ../../demo.wav
model_path: ../../models-ax650
model_type: tiny
language: zh
Init whisper success, take 0.3540seconds
Result: 甚至出现交易几乎停止的情况
RTF: 0.0968
```
### Server Mode
```
cd cpp/ax650
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_svr -t tiny
port: 8080
model_path: ../../models-ax650
model_type: tiny
language: zh
[I][ main][ 60]: Initializing server...
[I][ main][ 65]: Init server success
[I][ start][ 32]: Start server at port 8080, POST binary stream to IP:8080/asr
```
### Client test using curl:
```
ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
-H "Content-Type: application/octet-stream" \
--data-binary @-
```
## Performance Benchmarks
### Latency
RTF: Real-Time Factor
CPP:
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.08 | |
| Whisper-Base | 0.11 | 0.35 |
| Whisper-Small | 0.24 | |
| Whisper-Turbo | 0.48 | |
Python:
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.12 | |
| Whisper-Base | 0.16 | 0.35 |
| Whisper-Small | 0.50 | |
| Whisper-Turbo | 0.60 | |
### Word Error Rate(Test on AIShell dataset)
| Models | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny | 0.24 | |
| Whisper-Base | 0.18 | |
| Whisper-Small | 0.11 | |
| Whisper-Turbo | 0.06 | |
To reproduce WER test results:
Download dataset:
```
cd model_convert
bash download_dataset.sh
```
Run test script:
```
cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../model_convert/datasets/ground_truth.txt --model_type tiny
```
### MEM Usage
* CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc.
Python:
| Models | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny | 332 | 512 |
| Whisper-Base | 533 | 644 |
| Whisper-Small | 1106 | 906 |
| Whisper-Turbo | 2065 | 2084 |
C++:
| Models | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny | 332 | 31 |
| Whisper-Base | 533 | 54 |
| Whisper-Small | 1106 | 146 |
| Whisper-Turbo | 2065 | 86 |
## Technical Discussion
- Github issues
- Tencent QQ Group: 139953715