File size: 6,438 Bytes

---
license: bsd-3-clause
pipeline_tag: automatic-speech-recognition
---

# Whisper

 - [English](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README_EN.md)  
 - [中文](https://huggingface.co/AXERA-TECH/Whisper/blob/main/README.md)

OpenAI Whisper on Axera

- 目前支持 C++ 和 Python 两种语言
- 预编译模型下载
  - [Huggingface](https://huggingface.co/AXERA-TECH/Whisper)

- 如需自行转换请参考[模型转换](https://github.com/ml-inory/whisper.axera/blob/main/model_convert/README.md)


## Update

 - 2026/01/14: 更简单的模型结构，现在只需要encoder和decoder，去掉原来的decoder_main和decoder_loop；支持来自HuggingFace的模型导出


## 支持平台

- [x] AX650N
- [x] AX630C

## 模型转换

目前支持的模型规模:
 - tiny
 - base
 - small
 - medium
 - turbo


目前测试过的语言:
 - English
 - Chinese
 - Japanese
 - Korean
 - Malaysian

[模型转换](https://github.com/ml-inory/whisper.axera/blob/main/model_convert/README.md)

## 上板部署

- 基于 AX650N、AX630C 的设备已预装 Ubuntu22.04
- 链接互联网，确保设备能正常执行 `apt install`, `pip install` 等指令
- 已验证设备：
  - [爱芯派Pro(AX650N)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card(AX650N)](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
  - [爱芯派2(AX630C)](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
  - [Module-LLM(AX630C)](https://docs.m5stack.com/zh_CN/module/Module-LLM)
  - [LLM630 Compute Kit(AX630C)](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
- 支持编程语言:
  - [Python](#Python)
  - [C++](#CPP)

<h3 id="Python">Python</h3>

#### Requirements

推荐在板上安装Miniconda管理虚拟环境，安装方法如下:
```
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh

source ~/miniconda3/bin/activate

conda init --all
```

安装Whisper依赖
```
cd python

conda create -n whisper python=3.12
conda activate whisper
pip3 install -r requirements.txt
```

####  安装pyaxenigne

参考 https://github.com/AXERA-TECH/pyaxengine 安装 NPU Python API

在0.1.3rc2上测试通过，可通过
```
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl
```
安装，或把版本号更改为你想使用的版本


#### 运行

登陆开发板后

输入命令

```
cd python  
conda activate whisper
python3 main.py --model_type small --model_path ../models-ax650 --wav ../demo.wav --language zh
```

输出结果

```
(whisper) root@ax650:/mnt/data/Github/whisper.axera/python# python whisper_cli.py -t tiny -w ../demo.wav
[INFO] Available providers:  ['AxEngineExecutionProvider']
{'wav': '../demo.wav', 'model_type': 'tiny', 'model_path': '../models-ax650', 'language': 'zh', 'task': 'transcribe'}
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.0 76f70fdc
ASR result:
擅职出现交易几乎停止的情况
RTF: 0.11406774537746188

```

运行参数说明:  
| 参数名称 | 说明 | 默认值 |
| --- | --- | --- |
| --wav | 输入音频文件 | |
| --model_type/-t | 模型类型, tiny/base/small | |
| --model_path/-p | 模型所在目录 | ../models |
| --language/-l | 识别语言 | zh |


##### 服务端

```
(whisper) root@ax650:/mnt/data/Github/whisper.axera/python# python whisper_svr.py
[INFO] Available providers:  ['AxEngineExecutionProvider']
Server started at http://0.0.0.0:8000

```

测试服务端
```
python test_svr.py
```


<h3 id="CPP">CPP</h3>

#### 运行

在 AX650N 设备上执行

```
cd cpp
./whisper_cli -w ../demo.wav -t tiny
```

或  

```
cd cpp
./whisper_cli --model_type small -w ../demo.wav
```

输出结果

```
(whisper) root@ax650:/mnt/data/HF/Whisper/cpp/ax650# ./whisper_cli -w ../../demo.wav -t tiny
wav_file: ../../demo.wav
model_path: ../../models-ax650
model_type: tiny
language: zh
Init whisper success, take 0.3540seconds
Result: 甚至出现交易几乎停止的情况
RTF: 0.0968

```

### 服务端

```
cd cpp/ax650
./whisper_srv --model_type tiny --language zh --port 8080
```

### 客户端

curl命令行测试(请自行替换IP和端口):  
```
ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
  -H "Content-Type: application/octet-stream" \
  --data-binary @-
```

## 模型性能

### Latency

RTF: Real-Time Factor

CPP:

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  | 0.08   |        |
| Whisper-Base  | 0.11   | 0.35   |
| Whisper-Small | 0.24   |        |
| Whisper-Turbo | 0.48   |        |

Python:  

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  | 0.12   |        |
| Whisper-Base  | 0.16   | 0.35   |
| Whisper-Small | 0.50   |        |
| Whisper-Turbo | 0.60   |        |

### Word Error Rate(Test on AIShell dataset)

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  0.24  |        |
| Whisper-Base  |  0.18  |        |
| Whisper-Small |  0.11  |        |
| Whisper-Turbo |  0.06  |        |

若要复现测试结果，请按照以下步骤:

解压数据集:
```
unzip datasets.zip
```

运行测试脚本:
```
cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../datasets/ground_truth.txt --model_type tiny

```

### MEM Usage

* CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc.

Python:  

| Models        | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  332   |  512   |
| Whisper-Base  |  533   |  644   |
| Whisper-Small |  1106  |  906   |
| Whisper-Turbo |  2065  |  2084  |

C++:  

| Models        | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  332   |  31    |
| Whisper-Base  |  533   |  54    |
| Whisper-Small |  1106  |  146   |
| Whisper-Turbo |  2065  |  86    |


## 技术讨论

- Github issues
- QQ 群: 139953715