File size: 7,624 Bytes
037c6a3
 
 
35b3bc8
 
 
 
ca02ffa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35b3bc8
 
 
 
 
 
 
 
 
 
ca02ffa
35b3bc8
 
 
 
 
ca02ffa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
---
license: bsd-3-clause
pipeline_tag: automatic-speech-recognition
---

# Whisper

OpenAI Whisper on Axera

- 目前支持 C++ 和 Python 两种语言
- 预编译模型下载
  - [Huggingface](https://huggingface.co/AXERA-TECH/Whisper)

- 如需自行转换请参考[模型转换](https://github.com/ml-inory/whisper.axera/blob/main/model_convert/README.md)

## 支持平台

- [x] AX650N
- [x] AX630C

## 模型转换

[模型转换](https://github.com/ml-inory/whisper.axera/blob/main/model_convert/README.md)

## 上板部署

- 基于 AX650N、AX630C 的设备已预装 Ubuntu22.04
- 链接互联网,确保设备能正常执行 `apt install`, `pip install` 等指令
- 已验证设备:
  - [爱芯派Pro(AX650N)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card(AX650N)](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
  - [爱芯派2(AX630C)](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
  - [Module-LLM(AX630C)](https://docs.m5stack.com/zh_CN/module/Module-LLM)
  - [LLM630 Compute Kit(AX630C)](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
- 支持编程语言:
  - [Python](#Python)
  - [C++](#CPP)

<h3 id="Python">Python</h3>

#### Requirements

推荐在板上安装Miniconda管理虚拟环境,安装方法如下:
```
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh

source ~/miniconda3/bin/activate

conda init --all
```

安装Whisper依赖
```
cd python

conda create -n whisper python=3.12
conda activate whisper
pip3 install -r requirements.txt
```

####  安装pyaxenigne

参考 https://github.com/AXERA-TECH/pyaxengine 安装 NPU Python API

在0.1.3rc2上测试通过,可通过
```
pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl
```
安装,或把版本号更改为你想使用的版本


#### 运行

登陆开发板后

输入命令

```
cd python  
conda activate whisper
python3 main.py --model_type small --model_path ../models-ax650 --wav ../demo.wav --language zh
```

输出结果

```
root@ax650:/mnt/qtang/whisper.axera/python# python3 main.py --wav ../demo.wav --model_type small --model_path ../models/ --language zh
[INFO] Available providers:  ['AxEngineExecutionProvider']
wav: ../demo.wav
model_type: small
model_path: ../models/
language: zh
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.10.1s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 3.2-patch1 117f5fd4
Load models take 2322.563409805298ms
Preprocess wav take 6971.68493270874ms
Run encoder take 211.52877807617188ms
Run decoder_main take 79.00094985961914ms
First token: 17556
Run decoder_loop take 101.91774368286133ms
Iter 0   Token: 20844
Run decoder_loop take 60.30416488647461ms
Iter 1   Token: 7781
Run decoder_loop take 60.22000312805176ms
Iter 2   Token: 20204
Run decoder_loop take 60.23716926574707ms
Iter 3   Token: 28455
Run decoder_loop take 60.214996337890625ms
Iter 4   Token: 31962
Run decoder_loop take 60.17565727233887ms
Iter 5   Token: 6336
Run decoder_loop take 60.94002723693848ms
Iter 6   Token: 254
Run decoder_loop take 60.71639060974121ms
Iter 7   Token: 2930
Run decoder_loop take 60.225725173950195ms
Iter 8   Token: 236
Run decoder_loop take 60.167789459228516ms
Iter 9   Token: 36135
Run decoder_loop take 60.29987335205078ms
Iter 10          Token: 15868
Run decoder_loop take 61.163902282714844ms
Iter 11          Token: 252
Run decoder_loop take 60.273170471191406ms
Iter 12          Token: 1546
Run decoder_loop take 60.23144721984863ms
Iter 13          Token: 46514
Run decoder_loop take 60.31966209411621ms
Iter 14          Token: 50257
Result: 甚至出现交易几乎停滞的情况
```

运行参数说明:  
| 参数名称 | 说明 | 默认值 |
| --- | --- | --- |
| --wav | 输入音频文件 | |
| --model_type/-t | 模型类型, tiny/base/small | |
| --model_path/-p | 模型所在目录 | ../models |
| --language/-l | 识别语言 | zh |


<h3 id="CPP">CPP</h3>

#### 运行

在 AX650N 设备上执行

```
cd cpp
./whisper -w ../demo.wav
``````
cd cpp
./whisper --model_type small --model_path ../models -w ../demo.wav
```

输出结果

```
root@ax650:/mnt/qtang/whisper.axera/cpp# ./install/whisper --wav ../demo.wav --model_type small --model_path ../models/ --language zh
wav_file: ../demo.wav
model_path: ../models/
model_type: small
language: zh
Encoder run take 188.30 ms
First token: 17556       take 81.88ms
Next Token: 20844        take 29.64ms
Next Token: 7781         take 29.70ms
Next Token: 20204        take 29.64ms
Next Token: 28455        take 29.65ms
Next Token: 31962        take 29.61ms
Next Token: 6336         take 29.67ms
Next Token: 254          take 29.63ms
Next Token: 2930         take 29.61ms
Next Token: 236          take 29.56ms
Next Token: 36135        take 29.64ms
Next Token: 15868        take 29.71ms
Next Token: 252          take 29.51ms
Next Token: 1546         take 29.63ms
Next Token: 46514        take 29.51ms
Next Token: 50257        take 29.69ms
All take 801.13 ms
Result: 甚至出现交易几乎停滞的情况
```

### 服务端

```
cd cpp
./whisper_srv --model_type tiny --model_path ../models-ax650 --language zh --port 8080
```

### 客户端

curl命令行测试(请自行替换IP和端口):  
```
ffmpeg -i demo.wav -f f32le -c:a pcm_f32le - 2>/dev/null | \
curl -X POST 10.126.33.192:8080/asr \
  -H "Content-Type: application/octet-stream" \
  --data-binary @-
```

## 模型性能

### Latency

RTF: Real-Time Factor

CPP:

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  | 0.08   |        |
| Whisper-Base  | 0.11   | 0.35   |
| Whisper-Small | 0.24   |        |
| Whisper-Turbo | 0.48   |        |

Python:  

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  | 0.12   |        |
| Whisper-Base  | 0.16   | 0.35   |
| Whisper-Small | 0.50   |        |
| Whisper-Turbo | 0.60   |        |

### Word Error Rate(Test on AIShell dataset)

| Models        | AX650N | AX630C |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  0.24  |        |
| Whisper-Base  |  0.18  |        |
| Whisper-Small |  0.11  |        |
| Whisper-Turbo |  0.06  |        |

若要复现测试结果,请按照以下步骤:

解压数据集:
```
unzip datasets.zip
```

运行测试脚本:
```
cd python
conda activate whisper
python test_wer.py -d aishell --gt_path ../datasets/ground_truth.txt --model_type tiny

```

### MEM Usage

* CMM Stands for Physical memory used by Axera modules like VDEC(Video decoder), VENC(Video encoder), NPU, etc.

Python:  

| Models        | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  332   |  512   |
| Whisper-Base  |  533   |  644   |
| Whisper-Small |  1106  |  906   |
| Whisper-Turbo |  2065  |  2084  |

C++:  

| Models        | CMM(MB)| OS(MB) |
| ------------- | ------ | ------ |
| Whisper-Tiny  |  332   |  31    |
| Whisper-Base  |  533   |  54    |
| Whisper-Small |  1106  |  146   |
| Whisper-Turbo |  2065  |  86    |


## 技术讨论

- Github issues
- QQ 群: 139953715