hynt
/

EfficientConformerVietnamese

Model card Files Files and versions

hynt commited on Apr 26, 2025

Commit

d7a1e56

·

verified ·

1 Parent(s): 20b99da

Update README.md

Files changed (1) hide show

README.md +22 -3

README.md CHANGED Viewed

@@ -1,3 +1,15 @@
 # Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition
 **Efficient Conformer [Paper](https://arxiv.org/abs/2109.01163)**
@@ -21,6 +33,13 @@ Install [ctcdecode](https://github.com/parlance/ctcdecode)
 ## Prepare dataset and training pipline
 Steps:
 - Prepare a dataset folder that includes the data domains you want to train on, for example: ASRDataset/VLSP2020, ASRDataset/VLSP2021. Inside each VLSP2020 folder, there should be corresponding .wav and .txt files.
@@ -47,9 +66,9 @@ tensorboard --logdir callback_path
 ## LibriSpeech Performance
-| Model        			| Size     	| Type  | Params (M) | gigaspeech_test/vlsp2023_test_pb/vlsp2023_test_pr gready WER (%)| gigaspeech_test/vlsp2023_test_pb/vlsp2023_test_pr n-gram WER (%) | GPUs |
-| :-------------------:	|:--------:	|:-----:|:----------:|:------:|:------:|:------:|
-| Efficient Conformer	| Small		| CTC 	| 13.4  | 19.61 / 23.06 / 23.17 | 17.86 / 21.11 / 21.42 | 1 x RTX 3090 |
 In the competition organized by VLSP, I used the Efficient Conformer Large architecture with approximately 127 million parameters. You can find the detailed results in the technical report below:
 https://www.overleaf.com/read/nhqjtcpktjyc#3b472e

+---
+tags:
+- speech-to-text
+- vietnamese
+- ai-model
+- deep-learning
+license: apache-2.0
+library_name: pytorch
+model_name: EfficientConformerVietnamese
+language: vi
+---
 # Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition
 **Efficient Conformer [Paper](https://arxiv.org/abs/2109.01163)**
 ## Prepare dataset and training pipline
+Dataset to train this mini version:
+- Vivos
+- Vietbud_500
+- VLSP2020, VLSP2021, VLSP2022
+- VietMed_labeled
+- Google Fleurs
 Steps:
 - Prepare a dataset folder that includes the data domains you want to train on, for example: ASRDataset/VLSP2020, ASRDataset/VLSP2021. Inside each VLSP2020 folder, there should be corresponding .wav and .txt files.
 ## LibriSpeech Performance
+Model / Test Dataset                    Gigaspeech_test              VLSP2023_pb_test             VLSP2023_pr_test
+                              (greedy/n-gram beam search)  (greedy/n-gram beam search)  (greedy/n-gram beam search)
+Efficient-Conformer-Small-CTC        19.61/21.11                  23.06/21.11                  23.17/21.42
 In the competition organized by VLSP, I used the Efficient Conformer Large architecture with approximately 127 million parameters. You can find the detailed results in the technical report below:
 https://www.overleaf.com/read/nhqjtcpktjyc#3b472e