Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition
|
| 2 |
|
| 3 |
**Efficient Conformer [Paper](https://arxiv.org/abs/2109.01163)**
|
|
@@ -21,6 +33,13 @@ Install [ctcdecode](https://github.com/parlance/ctcdecode)
|
|
| 21 |
|
| 22 |
## Prepare dataset and training pipline
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
Steps:
|
| 25 |
|
| 26 |
- Prepare a dataset folder that includes the data domains you want to train on, for example: ASRDataset/VLSP2020, ASRDataset/VLSP2021. Inside each VLSP2020 folder, there should be corresponding .wav and .txt files.
|
|
@@ -47,9 +66,9 @@ tensorboard --logdir callback_path
|
|
| 47 |
|
| 48 |
## LibriSpeech Performance
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
|
| 54 |
In the competition organized by VLSP, I used the Efficient Conformer Large architecture with approximately 127 million parameters. You can find the detailed results in the technical report below:
|
| 55 |
https://www.overleaf.com/read/nhqjtcpktjyc#3b472e
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- speech-to-text
|
| 4 |
+
- vietnamese
|
| 5 |
+
- ai-model
|
| 6 |
+
- deep-learning
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
library_name: pytorch
|
| 9 |
+
model_name: EfficientConformerVietnamese
|
| 10 |
+
language: vi
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
# Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition
|
| 14 |
|
| 15 |
**Efficient Conformer [Paper](https://arxiv.org/abs/2109.01163)**
|
|
|
|
| 33 |
|
| 34 |
## Prepare dataset and training pipline
|
| 35 |
|
| 36 |
+
Dataset to train this mini version:
|
| 37 |
+
- Vivos
|
| 38 |
+
- Vietbud_500
|
| 39 |
+
- VLSP2020, VLSP2021, VLSP2022
|
| 40 |
+
- VietMed_labeled
|
| 41 |
+
- Google Fleurs
|
| 42 |
+
|
| 43 |
Steps:
|
| 44 |
|
| 45 |
- Prepare a dataset folder that includes the data domains you want to train on, for example: ASRDataset/VLSP2020, ASRDataset/VLSP2021. Inside each VLSP2020 folder, there should be corresponding .wav and .txt files.
|
|
|
|
| 66 |
|
| 67 |
## LibriSpeech Performance
|
| 68 |
|
| 69 |
+
Model / Test Dataset Gigaspeech_test VLSP2023_pb_test VLSP2023_pr_test
|
| 70 |
+
(greedy/n-gram beam search) (greedy/n-gram beam search) (greedy/n-gram beam search)
|
| 71 |
+
Efficient-Conformer-Small-CTC 19.61/21.11 23.06/21.11 23.17/21.42
|
| 72 |
|
| 73 |
In the competition organized by VLSP, I used the Efficient Conformer Large architecture with approximately 127 million parameters. You can find the detailed results in the technical report below:
|
| 74 |
https://www.overleaf.com/read/nhqjtcpktjyc#3b472e
|