Bartelds commited on
Commit
63bbd61
·
1 Parent(s): 68f5728

Upload checkpoint, sanitized config, and transcripts for ctc-baseline_mms_set_4

Browse files
Files changed (5) hide show
  1. README.md +41 -0
  2. config.yaml +343 -0
  3. hyp.trn +0 -0
  4. ref.trn +0 -0
  5. valid.loss.best.pth +3 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "CTC-DRO MMS-based ASR model - set 4"
3
+ language: multilingual
4
+ tags:
5
+ - asr
6
+ - ctc-dro
7
+ - MMS
8
+ license: cc-by-nc-4.0
9
+ ---
10
+
11
+ # CTC-Baseline MMS-based ASR model - set 4
12
+
13
+ This repository contains a CTC-Baseline MMS-based automatic speech recognition (ASR) model trained with ESPnet.
14
+ The model was trained on balanced training data from set 4.
15
+
16
+ ## Intended Use
17
+
18
+ This model is intended for ASR. Users can run inference using the provided checkpoint (`valid.loss.best.pth`) and configuration file (`config.yaml`):
19
+ ```bash
20
+ import soundfile as sf
21
+ from espnet2.bin.asr_inference import Speech2Text
22
+
23
+ asr_train_config = "ctc-baseline_mms_set_4/config.yaml"
24
+ asr_model_file = "ctc-baseline_mms_set_4/valid.loss.best.pth"
25
+
26
+ model = Speech2Text.from_pretrained(
27
+ asr_train_config=asr_train_config,
28
+ asr_model_file=asr_model_file
29
+ )
30
+
31
+ speech, _ = sf.read("input.wav")
32
+ text, *_ = model(speech)[0]
33
+
34
+ print("Recognized text:", text)
35
+ ```
36
+
37
+ ## How to Use
38
+
39
+ 1. Clone this repository.
40
+ 2. Use ESPnet’s inference scripts with the provided `config.yaml` and checkpoint file.
41
+ 3. Ensure any external resources referenced in `config.yaml` are available at the indicated relative paths.
config.yaml ADDED
@@ -0,0 +1,343 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ accum_grad: 16
2
+ adapter: lora
3
+ adapter_conf: {}
4
+ allow_multi_rates: false
5
+ allow_variable_data_keys: false
6
+ aux_ctc_tasks: []
7
+ batch_bins: 1000000
8
+ batch_size: 4
9
+ batch_type: duration_language
10
+ best_model_criterion:
11
+ - - valid
12
+ - loss
13
+ - min
14
+ bpemodel: null
15
+ chunk_default_fs: null
16
+ chunk_excluded_key_prefixes: []
17
+ chunk_length: 500
18
+ chunk_shift_ratio: 0.5
19
+ cleaner: null
20
+ collect_stats: false
21
+ create_graph_in_tensorboard: false
22
+ ctc_conf:
23
+ ctc_type: builtin
24
+ cudnn_benchmark: false
25
+ cudnn_deterministic: true
26
+ cudnn_enabled: true
27
+ decoder: null
28
+ decoder_conf: {}
29
+ detect_anomaly: false
30
+ distributed: false
31
+ drop_last_iter: false
32
+ dry_run: false
33
+ duration_batch_length: -1
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ encoder: transformer
39
+ encoder_conf:
40
+ attention_dropout_rate: 0.1
41
+ attention_heads: 8
42
+ dropout_rate: 0.1
43
+ input_layer: conv2d2
44
+ linear_units: 1024
45
+ normalize_before: true
46
+ num_blocks: 2
47
+ output_size: 256
48
+ positional_dropout_rate: 0.1
49
+ exclude_weight_decay: false
50
+ exclude_weight_decay_conf: {}
51
+ fold_length:
52
+ - 80000
53
+ - 150
54
+ freeze_param: []
55
+ frontend: s3prl
56
+ frontend_conf:
57
+ download_dir: ./hub
58
+ frontend_conf:
59
+ path_or_url: facebook/mms-300m
60
+ upstream: hf_wav2vec2_custom
61
+ fs: 16k
62
+ multilayer_feature: true
63
+ g2p: null
64
+ grad_clip: 5.0
65
+ grad_clip_type: 2.0
66
+ grad_noise: false
67
+ ignore_init_mismatch: false
68
+ init: xavier_uniform
69
+ init_param: []
70
+ input_size: null
71
+ iterator_type: sequence
72
+ joint_net_conf: null
73
+ keep_nbest_models: 2
74
+ log_interval: null
75
+ log_level: INFO
76
+ max_cache_fd: 32
77
+ max_cache_size: 0.0
78
+ max_epoch: 40
79
+ model: espnet
80
+ model_conf:
81
+ ctc_weight: 1.0
82
+ multiple_iterator: false
83
+ multiprocessing_distributed: false
84
+ nbest_averaging_interval: 0
85
+ ngpu: 1
86
+ no_forward_run: false
87
+ noise_apply_prob: 1.0
88
+ noise_db_range: '13_15'
89
+ noise_scp: null
90
+ non_linguistic_symbols: ./nlsyms.txt
91
+ normalize: utterance_mvn
92
+ normalize_conf: {}
93
+ num_att_plot: 3
94
+ num_cache_chunks: 1024
95
+ num_iters_per_epoch: 140
96
+ num_workers: 4
97
+ optim: adam
98
+ optim_conf:
99
+ lr: 0.0001
100
+ weight_decay: 1.0e-06
101
+ output_dir: ./inference_results
102
+ patience: null
103
+ postencoder: null
104
+ postencoder_conf: {}
105
+ preencoder: linear
106
+ preencoder_conf:
107
+ input_size: 1024
108
+ output_size: 80
109
+ preprocessor: default
110
+ preprocessor_conf: {}
111
+ pretrain_path: null
112
+ print_config: false
113
+ required:
114
+ - output_dir
115
+ - token_list
116
+ resume: true
117
+ rir_apply_prob: 1.0
118
+ rir_scp: null
119
+ save_strategy: all
120
+ scheduler: null
121
+ scheduler_conf: {}
122
+ seed: 0
123
+ sharded_ddp: false
124
+ short_noise_thres: 0.5
125
+ shuffle_within_batch: false
126
+ sort_batch: descending
127
+ sort_in_batch: descending
128
+ specaug: specaug
129
+ specaug_conf:
130
+ apply_freq_mask: true
131
+ apply_time_mask: true
132
+ apply_time_warp: true
133
+ freq_mask_width_range:
134
+ - 0
135
+ - 27
136
+ num_freq_mask: 2
137
+ num_time_mask: 10
138
+ time_mask_width_ratio_range:
139
+ - 0.0
140
+ - 0.05
141
+ time_warp_mode: bicubic
142
+ time_warp_window: 5
143
+ speech_volume_normalize: null
144
+ token_list:
145
+ - <blank>
146
+ - <unk>
147
+ - <space>
148
+ - E
149
+ - A
150
+ - O
151
+ - N
152
+ - S
153
+ - I
154
+ - ا
155
+ - L
156
+ - T
157
+ - R
158
+ - و
159
+ - D
160
+ - ن
161
+ - ر
162
+ - ی
163
+ - ي
164
+ - M
165
+ - U
166
+ - H
167
+ - P
168
+ - ک
169
+ - م
170
+ - C
171
+ - А
172
+ - Ӹ
173
+ - Н
174
+ - B
175
+ - ت
176
+ - س
177
+ - ل
178
+ - J
179
+ - K
180
+ - ہ
181
+ - Т
182
+ - ے
183
+ - G
184
+ - Ш
185
+ - К
186
+ - Е
187
+ - Л
188
+ - Ы
189
+ - V
190
+ - М
191
+ - ج
192
+ - Ӓ
193
+ - ه
194
+ - ب
195
+ - د
196
+ - О
197
+ - Y
198
+ - '[slv]'
199
+ - Р
200
+ - ڪ
201
+ - پ
202
+ - Z
203
+ - '[mrj]'
204
+ - F
205
+ - گ
206
+ - И
207
+ - В
208
+ - ئ
209
+ - Д
210
+ - '[sot]'
211
+ - ں
212
+ - '[spa]'
213
+ - W
214
+ - Q
215
+ - П
216
+ - Г
217
+ - ف
218
+ - ق
219
+ - С
220
+ - ع
221
+ - ش
222
+ - Ж
223
+ - ز
224
+ - ھ
225
+ - آ
226
+ - Č
227
+ - Í
228
+ - У
229
+ - ح
230
+ - '[urd]'
231
+ - Š
232
+ - ٹ
233
+ - چ
234
+ - Ь
235
+ - ٽ
236
+ - '[snd]'
237
+ - ڻ
238
+ - Й
239
+ - ط
240
+ - ص
241
+ - ٿ
242
+ - Ц
243
+ - خ
244
+ - Ó
245
+ - Я
246
+ - Á
247
+ - É
248
+ - Ч
249
+ - ۾
250
+ - '0'
251
+ - Ž
252
+ - З
253
+ - '1'
254
+ - ۽
255
+ - –
256
+ - ڏ
257
+ - Э
258
+ - ڊ
259
+ - —
260
+ - ڈ
261
+ - ء
262
+ - Ñ
263
+ - ڙ
264
+ - ِ
265
+ - '2'
266
+ - ٻ
267
+ - Х
268
+ - Ӱ
269
+ - ظ
270
+ - ض
271
+ - ث
272
+ - ڳ
273
+ - ،
274
+ - X
275
+ - ¡
276
+ - غ
277
+ - ڑ
278
+ - Ӧ
279
+ - ذ
280
+ - ¿
281
+ - '5'
282
+ - ڌ
283
+ - '3'
284
+ - ڀ
285
+ - ُ
286
+ - '9'
287
+ - Ú
288
+ - '4'
289
+ - '8'
290
+ - ۔
291
+ - '6'
292
+ - ٺ
293
+ - Ю
294
+ - »
295
+ - Б
296
+ - «
297
+ - ڇ
298
+ - ً
299
+ - ڃ
300
+ - '7'
301
+ - ڄ
302
+ - ؤ
303
+ - ڍ
304
+ - Ф
305
+ - َ
306
+ - ٰ
307
+ - ّ
308
+ - ڱ
309
+ - ”
310
+ - ژ
311
+ - ڦ
312
+ - Ё
313
+ - ؛
314
+ - ٍ
315
+ - Щ
316
+ - ؟
317
+ - ’
318
+ - ‘
319
+ - °
320
+ - ۃ
321
+ - إ
322
+ - Ć
323
+ - <sos/eos>
324
+ token_type: char
325
+ train_dtype: float32
326
+ unused_parameters: true
327
+ use_adapter: false
328
+ use_amp: false
329
+ use_lang_prompt: false
330
+ use_matplotlib: true
331
+ use_nlp_prompt: false
332
+ use_preprocessor: true
333
+ use_tensorboard: true
334
+ val_scheduler_criterion:
335
+ - valid
336
+ - loss
337
+ valid_batch_bins: null
338
+ valid_batch_size: null
339
+ valid_batch_type: null
340
+ valid_iterator_type: null
341
+ valid_max_cache_size: null
342
+ version: '202402'
343
+ write_collected_feats: false
hyp.trn ADDED
The diff for this file is too large to render. See raw diff
 
ref.trn ADDED
The diff for this file is too large to render. See raw diff
 
valid.loss.best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d643d36c1cacfdbe4162ffe5755c790e7a195544ddd4d7b23b320475dd852c83
3
+ size 1280866892