Update README.md
Browse files
README.md
CHANGED
|
@@ -22,17 +22,19 @@ The training recipe was based on wsj recipe in [espnet](https://github.com/espne
|
|
| 22 |
|
| 23 |
<!-- Provide a longer summary of what this model is. -->
|
| 24 |
|
| 25 |
-
This model is Hybrid CTC/Attention model with pre-trained HuBERT encoder.
|
| 26 |
|
| 27 |
-
The model pre-trained on Thai-central and fine-tuned on Khummuang, Korat, and Pattani.
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
you can demo on colab with [this link](https://colab.research.google.com/drive/1stltGdpG9OV-sCl9QgkvEXZV7fGB2Ixe?usp=sharing). (Please note that you cannot inference >4 seconds of audio with free Google colab)
|
| 30 |
|
| 31 |
## Evaluation
|
| 32 |
|
| 33 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 34 |
|
| 35 |
-
For evaluation, the metrics are CER and WER.
|
| 36 |
|
| 37 |
In this reposirity, we also provide the vocabulary for building the newmm tokenizer using this script:
|
| 38 |
|
|
@@ -55,7 +57,7 @@ custom_tokenizer = get_tokenizer(vocab)
|
|
| 55 |
tokenized_sentence_list = custom_tokenizer.word_tokenize(<your_sentence>)
|
| 56 |
```
|
| 57 |
|
| 58 |
-
The CER and WER results on test set are:
|
| 59 |
|
| 60 |
|Micro CER|Macro CER|Survival CER|E-commerce WER|Micro WER|Macro WER|Survival WER|E-commerce WER|
|
| 61 |
|---|---|---|---|---|---|---|---|
|
|
|
|
| 22 |
|
| 23 |
<!-- Provide a longer summary of what this model is. -->
|
| 24 |
|
| 25 |
+
This model is a Hybrid CTC/Attention model with pre-trained HuBERT encoder.
|
| 26 |
|
| 27 |
+
The model was pre-trained on Thai-central and fine-tuned on Khummuang, Korat, and Pattani. (Experiment 3 in the paper)
|
| 28 |
+
|
| 29 |
+
We provide some demo code to do inference with this model architecture on colab [here](https://colab.research.google.com/drive/1stltGdpG9OV-sCl9QgkvEXZV7fGB2Ixe?usp=sharing). (Please note that you cannot inference >4 seconds of audio with free Google colab)
|
| 30 |
+
(Code is for Thai-Central. Please select the correct model accordingly.)
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
## Evaluation
|
| 34 |
|
| 35 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 36 |
|
| 37 |
+
For evaluation, the metrics are CER and WER. Before WER evaluation, transcriptions were re-tokenized using newmm tokenizer in [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)
|
| 38 |
|
| 39 |
In this reposirity, we also provide the vocabulary for building the newmm tokenizer using this script:
|
| 40 |
|
|
|
|
| 57 |
tokenized_sentence_list = custom_tokenizer.word_tokenize(<your_sentence>)
|
| 58 |
```
|
| 59 |
|
| 60 |
+
The CER and WER results on the test set are:
|
| 61 |
|
| 62 |
|Micro CER|Macro CER|Survival CER|E-commerce WER|Micro WER|Macro WER|Survival WER|E-commerce WER|
|
| 63 |
|---|---|---|---|---|---|---|---|
|