File size: 6,849 Bytes

c453403
 
 
 
 
798e40d
 
7b1a9da
 
798e40d

---
license: bsd-3-clause
pipeline_tag: automatic-speech-recognition
---

# whisper.axera

 - [English](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README_EN.md)
 - [中文](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README.md)

OpenAI Whisper on Axera Platform

## Overview

This project provides an optimized implementation of OpenAI's Whisper speech recognition model for Axera AI processors (AX650N/AX630C). It supports both C++ and Python interfaces for efficient on-device speech-to-text conversion.

## Features

- **Dual Language Support**: Both C++ and Python APIs available
- **Multiple Model Sizes**: Support for tiny, base, small, and turbo model variants
- **Multi-language Recognition**: Tested with English, Chinese, Japanese, and Korean
- **Optimized Performance**: Specially optimized for Axera NPU acceleration
- **Easy Deployment**: Pre-built packages and cross-compilation support

## Update

 - 2026/01/14: We provide cleaner model architecture now.(With encoder and decoder instead of decoder_main and decoder_loop). Support exporting models from huggingface.

## Supported Platforms

- ✅ AX650N
- ✅ AX630C

## Pre-trained Models

Download pre-compiled models from:
- [Baidu Cloud](https://pan.baidu.com/s/1tOHVMZCin0A68T5HmKRJyg?pwd=axyz)
- [Huggingface](https://huggingface.co/AXERA-TECH/Whisper)

For custom model conversion, please refer to [Model Conversion Guide](./model_convert/README_EN.md).

## Model Conversion

Currently supported model scales:
- tiny
- base  
- small
- medium
- turbo

Tested languages:
- English
- Chinese
- Japanese
- Korean
- Malaysian

For other languages or custom model sizes, please refer to the [Model Conversion Guide](./model_convert/README_EN.md).

## Deployment on Target Devices

### Prerequisites
- AX650N/AX630C devices with Ubuntu 22.04 pre-installed
- Internet connection for `apt install` and `pip install`
- Verified hardware platforms:
  - [MaixIV M4nDock (AX650N)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator Card (AX650N)](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
  - [Axera Pi 2 (AX630C)](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
  - [Module-LLM (AX630C)](https://docs.m5stack.com/zh_CN/module/Module-LLM)
  - [LLM630 Compute Kit (AX630C)](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)

## Programming Language Support

### Python

Tested with Python 3.12. We recommend using [Miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh) for environment management.

#### Installation

```bash
cd python
pip3 install -r requirements.txt
```

####  pyaxenigne

Install NPU Python API from: https://github.com/AXERA-TECH/pyaxengine

#### Usage

##### Command Line Interface

```
cd python  
(whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_cli.py -w ../demo.wav -t tiny
[INFO] Available providers:  ['AxEngineExecutionProvider']
{'wav': '../demo.wav', 'model_type': 'tiny', 'model_path': '../models-ax650', 'language': 'zh', 'task': 'transcribe'}
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
ASR result:
擅职出现交易几乎停止的情况
RTF: 0.10313174677896837

```

Command line arguments:
| Argument | Description | Default |
| --- | --- | --- |
| --wav | Input audio file | - |
| --model_type/-t | Model type: tiny/base/small | - |
| --model_path/-p | Model directory | ../models |
| --language/-l | Recognition language | zh |


##### Server Mode

```
(whisper) root@ax650:/mnt/data/HF/Whisper/python# python whisper_svr.py
[INFO] Available providers:  ['AxEngineExecutionProvider']
Server started at http://0.0.0.0:8000

```

Test the server:
```
python test_svr.py
```


<h3 id="CPP">CPP</h3>

#### Usage on Target Device
```
cd cpp/ax650
./whisper_cli -w ../demo.wav -t tiny
```

或  

```
cd cpp/ax650
./whisper_cli --model_type small -w ../demo.wav
```

Example Output:

```
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_cli -w ../../demo.wav -t tiny
wav_file: ../../demo.wav
model_path: ../../models-ax650
model_type: tiny
language: zh
Init whisper success, take 0.3540seconds
Result: 甚至出现交易几乎停止的情况
RTF: 0.0968

```

### Server Mode

```
cd cpp/ax650
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_svr -t tiny
port: 8080
model_path: ../../models-ax650
model_type: tiny
language: zh
[I][                            main][  60]: Initializing server...
[I][                            main][  65]: Init server success
[I][                           start][  32]: Start server at port 8080, POST binary stream to IP:8080/asr

```

### Client test using curl:

```
ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
  -H "Content-Type: application/octet-stream" \
  --data-binary @-
```

## Performance Benchmarks

### Latency

RTF: Real-Time Factor

CPP:

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  | 0.08   |        |
| Whisper-Base  | 0.11   | 0.35   |
| Whisper-Small | 0.24   |        |
| Whisper-Turbo | 0.48   |        |

Python:  

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  | 0.12   |        |
| Whisper-Base  | 0.16   | 0.35   |
| Whisper-Small | 0.50   |        |
| Whisper-Turbo | 0.60   |        |

### Word Error Rate(Test on AIShell dataset)

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  0.24  |        |
| Whisper-Base  |  0.18  |        |
| Whisper-Small |  0.11  |        |
| Whisper-Turbo |  0.06  |        |

To reproduce WER test results:  

Download dataset:  
```
cd model_convert
bash download_dataset.sh
```

Run test script:  
```
cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../model_convert/datasets/ground_truth.txt --model_type tiny

```

### MEM Usage

* CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc.

Python:  

| Models        | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  332   |  512   |
| Whisper-Base  |  533   |  644   |
| Whisper-Small |  1106  |  906   |
| Whisper-Turbo |  2065  |  2084  |

C++:  

| Models        | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  332   |  31    |
| Whisper-Base  |  533   |  54    |
| Whisper-Small |  1106  |  146   |
| Whisper-Turbo |  2065  |  86    |


## Technical Discussion

- Github issues
- Tencent QQ Group: 139953715