artitsu commited on
Commit
b9b4a20
·
1 Parent(s): 176e41b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -22,17 +22,19 @@ The training recipe was based on wsj recipe in [espnet](https://github.com/espne
22
 
23
  <!-- Provide a longer summary of what this model is. -->
24
 
25
- This model is Hybrid CTC/Attention model with pre-trained HuBERT encoder.
26
 
27
- The model pre-trained on Thai-central and fine-tuned on Khummuang, Korat, and Pattani.
 
 
 
28
 
29
- you can demo on colab with [this link](https://colab.research.google.com/drive/1stltGdpG9OV-sCl9QgkvEXZV7fGB2Ixe?usp=sharing). (Please note that you cannot inference >4 seconds of audio with free Google colab)
30
 
31
  ## Evaluation
32
 
33
  <!-- This section describes the evaluation protocols and provides the results. -->
34
 
35
- For evaluation, the metrics are CER and WER. before WER evaluation, transcriptions were re-tokenized using newmm tokenizer in [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)
36
 
37
  In this reposirity, we also provide the vocabulary for building the newmm tokenizer using this script:
38
 
@@ -55,7 +57,7 @@ custom_tokenizer = get_tokenizer(vocab)
55
  tokenized_sentence_list = custom_tokenizer.word_tokenize(<your_sentence>)
56
  ```
57
 
58
- The CER and WER results on test set are:
59
 
60
  |Micro CER|Macro CER|Survival CER|E-commerce WER|Micro WER|Macro WER|Survival WER|E-commerce WER|
61
  |---|---|---|---|---|---|---|---|
 
22
 
23
  <!-- Provide a longer summary of what this model is. -->
24
 
25
+ This model is a Hybrid CTC/Attention model with pre-trained HuBERT encoder.
26
 
27
+ The model was pre-trained on Thai-central and fine-tuned on Khummuang, Korat, and Pattani. (Experiment 3 in the paper)
28
+
29
+ We provide some demo code to do inference with this model architecture on colab [here](https://colab.research.google.com/drive/1stltGdpG9OV-sCl9QgkvEXZV7fGB2Ixe?usp=sharing). (Please note that you cannot inference >4 seconds of audio with free Google colab)
30
+ (Code is for Thai-Central. Please select the correct model accordingly.)
31
 
 
32
 
33
  ## Evaluation
34
 
35
  <!-- This section describes the evaluation protocols and provides the results. -->
36
 
37
+ For evaluation, the metrics are CER and WER. Before WER evaluation, transcriptions were re-tokenized using newmm tokenizer in [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp)
38
 
39
  In this reposirity, we also provide the vocabulary for building the newmm tokenizer using this script:
40
 
 
57
  tokenized_sentence_list = custom_tokenizer.word_tokenize(<your_sentence>)
58
  ```
59
 
60
+ The CER and WER results on the test set are:
61
 
62
  |Micro CER|Macro CER|Survival CER|E-commerce WER|Micro WER|Macro WER|Survival WER|E-commerce WER|
63
  |---|---|---|---|---|---|---|---|