niobures commited on
Commit
0ff1e39
·
verified ·
1 Parent(s): 492304d

T5G2P (models, paper)

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ T5G2P-bomuV2/T5G2P-bomuV2.nemo filter=lfs diff=lfs merge=lfs -text
37
+ T5G2P.[[:space:]]Using[[:space:]]Text-to-Text[[:space:]]Transfer[[:space:]]Transformer[[:space:]]for[[:space:]]Grapheme-to-Phoneme[[:space:]]Conversion.pdf filter=lfs diff=lfs merge=lfs -text
T5G2P-bomuV2/.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ T5G2P-bomuV2.nemo filter=lfs diff=lfs merge=lfs -text
T5G2P-bomuV2/README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: nemo
3
+ license: apache-2.0
4
+ tags:
5
+ - pytorch
6
+ - NeMo
7
+ datasets:
8
+ - Panga-Azazia/BomuG2PDatasetV2
9
+ base_model:
10
+ - google/byt5-small
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # T5g2p-bomuv2
15
+
16
+ <style>
17
+ img {
18
+ display: inline;
19
+ }
20
+ </style>
21
+
22
+ [![Model architecture](https://img.shields.io/badge/Model_Arch-PUT-YOUR-ARCHITECTURE-HERE-lightgrey#model-badge)](#model-architecture)
23
+ | [![Model size](https://img.shields.io/badge/Params-PUT-YOUR-MODEL-SIZE-HERE-lightgrey#model-badge)](#model-architecture)
24
+ | [![Language](https://img.shields.io/badge/Language-PUT-YOUR-LANGUAGE-HERE-lightgrey#model-badge)](#datasets)
25
+
26
+ **Put a short model description here.**
27
+
28
+ See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/index.html) for complete architecture details.
29
+
30
+
31
+ ## NVIDIA NeMo: Training
32
+
33
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
34
+ ```
35
+ pip install nemo_toolkit['all']
36
+ ```
37
+
38
+ ## How to Use this Model
39
+
40
+ The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
41
+
42
+ ### Automatically instantiate the model
43
+
44
+ **NOTE**: Please update the model class below to match the class of the model being uploaded.
45
+
46
+ ```python
47
+ import nemo.core import ModelPT
48
+ model = ModelPT.from_pretrained("Panga-Azazia/T5G2P-bomuV2")
49
+ ```
50
+
51
+ ### NOTE
52
+
53
+ Add some information about how to use the model here. An example is provided for ASR inference below.
54
+
55
+ ### Transcribing using Python
56
+ First, let's get a sample
57
+ ```
58
+ wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
59
+ ```
60
+ Then simply do:
61
+ ```
62
+ asr_model.transcribe(['2086-149220-0033.wav'])
63
+ ```
64
+
65
+ ### Transcribing many audio files
66
+
67
+ ```shell
68
+ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="Panga-Azazia/T5G2P-bomuV2" audio_dir=""
69
+ ```
70
+
71
+ ### Input
72
+
73
+ **Add some information about what are the inputs to this model**
74
+
75
+ ### Output
76
+
77
+ **Add some information about what are the outputs of this model**
78
+
79
+ ## Model Architecture
80
+
81
+ **Add information here discussing architectural details of the model or any comments to users about the model.**
82
+
83
+ ## Training
84
+
85
+ **Add information here about how the model was trained. It should be as detailed as possible, potentially including the the link to the script used to train as well as the base config used to train the model. If extraneous scripts are used to prepare the components of the model, please include them here.**
86
+
87
+ ### NOTE
88
+
89
+ An example is provided below for ASR
90
+
91
+ The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/fastconformer/fast-conformer_transducer_bpe.yaml).
92
+
93
+ The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
94
+
95
+
96
+ ### Datasets
97
+
98
+ **Try to provide as detailed a list of datasets as possible. If possible, provide links to the datasets on HF by adding it to the manifest section at the top of the README (marked by ---).**
99
+
100
+ ### NOTE
101
+
102
+ An example for the manifest section is provided below for ASR datasets
103
+
104
+ datasets:
105
+ - librispeech_asr
106
+ - fisher_corpus
107
+ - Switchboard-1
108
+ - WSJ-0
109
+ - WSJ-1
110
+ - National-Singapore-Corpus-Part-1
111
+ - National-Singapore-Corpus-Part-6
112
+ - vctk
113
+ - voxpopuli
114
+ - europarl
115
+ - multilingual_librispeech
116
+ - mozilla-foundation/common_voice_8_0
117
+ - MLCommons/peoples_speech
118
+
119
+ The corresponding text in this section for those datasets is stated below -
120
+
121
+ The model was trained on 64K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams.
122
+
123
+ The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets:
124
+
125
+ - Librispeech 960 hours of English speech
126
+ - Fisher Corpus
127
+ - Switchboard-1 Dataset
128
+ - WSJ-0 and WSJ-1
129
+ - National Speech Corpus (Part 1, Part 6)
130
+ - VCTK
131
+ - VoxPopuli (EN)
132
+ - Europarl-ASR (EN)
133
+ - Multilingual Librispeech (MLS EN) - 2,000 hour subset
134
+ - Mozilla Common Voice (v7.0)
135
+ - People's Speech - 12,000 hour subset
136
+
137
+
138
+ ## Performance
139
+
140
+ **Add information here about the performance of the model. Discuss what is the metric that is being used to evaluate the model and if there are external links explaning the custom metric, please link to it.
141
+
142
+ ### NOTE
143
+
144
+ An example is provided below for ASR metrics list that can be added to the top of the README
145
+
146
+ model-index:
147
+ - name: PUT_MODEL_NAME
148
+ results:
149
+ - task:
150
+ name: Automatic Speech Recognition
151
+ type: automatic-speech-recognition
152
+ dataset:
153
+ name: AMI (Meetings test)
154
+ type: edinburghcstr/ami
155
+ config: ihm
156
+ split: test
157
+ args:
158
+ language: en
159
+ metrics:
160
+ - name: Test WER
161
+ type: wer
162
+ value: 17.10
163
+ - task:
164
+ name: Automatic Speech Recognition
165
+ type: automatic-speech-recognition
166
+ dataset:
167
+ name: Earnings-22
168
+ type: revdotcom/earnings22
169
+ split: test
170
+ args:
171
+ language: en
172
+ metrics:
173
+ - name: Test WER
174
+ type: wer
175
+ value: 14.11
176
+
177
+ Provide any caveats about the results presented in the top of the discussion so that nuance is not lost.
178
+
179
+ It should ideally be in a tabular format (you can use the following website to make your tables in markdown format - https://www.tablesgenerator.com/markdown_tables)**
180
+
181
+ ## Limitations
182
+
183
+ **Discuss any practical limitations to the model when being used in real world cases. They can also be legal disclaimers, or discussion regarding the safety of the model (particularly in the case of LLMs).**
184
+
185
+
186
+ ### Note
187
+
188
+ An example is provided below
189
+
190
+ Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
191
+
192
+
193
+ ## License
194
+
195
+ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
196
+
197
+ ## References
198
+
199
+ **Provide appropriate references in the markdown link format below. Please order them numerically.**
200
+
201
+ [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
T5G2P-bomuV2/T5G2P-bomuV2.nemo ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a5a79f90530ab20c4b2e4c1c38db9729e2f4bdeb3faaa375eaec4412fcc4657
3
+ size 1198643200
T5G2P-bomuV2/events.out.tfevents.1755901454.cs-01k39x6rzc9hkn2engfmpy7k9q.9801.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bd1f0b78eb7391be8eb16309b03fdb146ac4f11008e91dca4eae2de73a9fc83
3
+ size 32386
T5G2P-bomuV2/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/Panga-Azazia/T5G2P-bomuV2
T5G2P. Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9c5e379f974839edd0d3a609c54f1dab88dd1633f0e836fdd5d36ddcd877f41
3
+ size 171695