Yuekai Zhang commited on
Commit ·
e405a04
1
Parent(s): aeea4b4
add readme
Browse files
README.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
## Results
|
| 2 |
+
|
| 3 |
+
### Aishell training results (Fine-tuning Pretrained Models)
|
| 4 |
+
#### Whisper
|
| 5 |
+
|
| 6 |
+
##### fine-tuning results on Aishell test set on whisper medium, large-v2, large-v3
|
| 7 |
+
|
| 8 |
+
| | test (greedy search, before fine-tuning) | test (beam=10, after fine-tuning) | comment |
|
| 9 |
+
|------------------------|------|------|-----------------------------------------|
|
| 10 |
+
| medium | 7.23 | 3.27 | --epoch 10 --avg 4, ddp |
|
| 11 |
+
| large-v2 | 6.56 | 2.47 | --epoch 10 --avg 6, deepspeed zero stage1 |
|
| 12 |
+
| large-v3 | 6.06 | 2.84 | --epoch 5 --avg 3, deepspeed zero stage1 |
|
| 13 |
+
|
| 14 |
+
Command for training is:
|
| 15 |
+
```bash
|
| 16 |
+
./prepare.sh --stage 30 --stop_stage 30
|
| 17 |
+
|
| 18 |
+
#fine-tuning with deepspeed zero stage 1
|
| 19 |
+
torchrun --nproc-per-node 8 ./whisper/train.py \
|
| 20 |
+
--max-duration 200 \
|
| 21 |
+
--use-fp16 1 \
|
| 22 |
+
--exp-dir whisper/exp_large_v2 \
|
| 23 |
+
--model-name large-v2 \
|
| 24 |
+
--deepspeed \
|
| 25 |
+
--deepspeed_config ./whisper/ds_config_zero1.json
|
| 26 |
+
|
| 27 |
+
# fine-tuning with ddp
|
| 28 |
+
torchrun --nproc-per-node 8 ./whisper/train.py \
|
| 29 |
+
--max-duration 200 \
|
| 30 |
+
--use-fp16 1 \
|
| 31 |
+
--exp-dir whisper/exp_medium \
|
| 32 |
+
--base-lr 1e-5 \
|
| 33 |
+
--model-name medium
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
Command for decoding is:
|
| 37 |
+
```bash
|
| 38 |
+
python3 ./whisper/decode.py \
|
| 39 |
+
--exp-dir whisper/exp_large_v2 \
|
| 40 |
+
--model-name large-v2 \
|
| 41 |
+
--epoch 999 --avg 1 \
|
| 42 |
+
--beam-size 10 --max-duration 50
|
| 43 |
+
```
|
| 44 |
+
NOTE: To decode with original whisper models, you should pad the input features into 30 secs. Otherwise it may not ouput EOS token.
|
| 45 |
+
|
| 46 |
+
Pretrained models, training logs, decoding logs, tensorboard and decoding results
|
| 47 |
+
are available at
|
| 48 |
+
<https://huggingface.co/yuekai/icefall_asr_aishell_whisper>
|