Automatic Speech Recognition
ESPnet
audio
Sujay Suresh Kumar commited on
Commit
be5e579
·
1 Parent(s): 5fbd928

Update model

Browse files
README.md ADDED
@@ -0,0 +1,445 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: noinfo
7
+ datasets:
8
+ - mr_openslr64
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/marathi_openslr64`
15
+
16
+ This model was trained by Sujay Suresh Kumar using mr_openslr64 recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 91325a1e58ca0b13494b94bf79b186b095fe0b58
23
+ pip install -e .
24
+ cd egs2/mr_openslr64/asr1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/marathi_openslr64
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Mon Mar 21 16:06:03 UTC 2022`
32
+ - python version: `3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]`
33
+ - espnet version: `espnet 0.10.7a1`
34
+ - pytorch version: `pytorch 1.11.0+cu102`
35
+ - Git hash: `91325a1e58ca0b13494b94bf79b186b095fe0b58`
36
+ - Commit date: `Mon Mar 21 00:40:52 2022 +0000`
37
+
38
+ ## asr_train_asr_conformer_xlsr_raw_bpe150_sp
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|3625|72.9|22.5|4.7|1.7|28.9|88.6|
44
+
45
+ ### CER
46
+
47
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
48
+ |---|---|---|---|---|---|---|---|---|
49
+ |decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|20557|91.4|3.1|5.5|1.9|10.5|88.6|
50
+
51
+ ### TER
52
+
53
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
54
+ |---|---|---|---|---|---|---|---|---|
55
+ |decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|13562|86.5|6.3|7.1|1.4|14.9|88.6|
56
+
57
+ ## ASR config
58
+
59
+ <details><summary>expand</summary>
60
+
61
+ ```
62
+ config: conf/tuning/train_asr_conformer_xlsr.yaml
63
+ print_config: false
64
+ log_level: INFO
65
+ dry_run: false
66
+ iterator_type: sequence
67
+ output_dir: exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp
68
+ ngpu: 1
69
+ seed: 0
70
+ num_workers: 1
71
+ num_att_plot: 3
72
+ dist_backend: nccl
73
+ dist_init_method: env://
74
+ dist_world_size: null
75
+ dist_rank: null
76
+ local_rank: 0
77
+ dist_master_addr: null
78
+ dist_master_port: null
79
+ dist_launcher: null
80
+ multiprocessing_distributed: false
81
+ unused_parameters: false
82
+ sharded_ddp: false
83
+ cudnn_enabled: true
84
+ cudnn_benchmark: false
85
+ cudnn_deterministic: true
86
+ collect_stats: false
87
+ write_collected_feats: false
88
+ max_epoch: 60
89
+ patience: null
90
+ val_scheduler_criterion:
91
+ - valid
92
+ - loss
93
+ early_stopping_criterion:
94
+ - valid
95
+ - loss
96
+ - min
97
+ best_model_criterion:
98
+ - - valid
99
+ - acc
100
+ - max
101
+ keep_nbest_models: 5
102
+ nbest_averaging_interval: 0
103
+ grad_clip: 5.0
104
+ grad_clip_type: 2.0
105
+ grad_noise: false
106
+ accum_grad: 3
107
+ no_forward_run: false
108
+ resume: true
109
+ train_dtype: float32
110
+ use_amp: false
111
+ log_interval: null
112
+ use_matplotlib: true
113
+ use_tensorboard: true
114
+ use_wandb: false
115
+ wandb_project: null
116
+ wandb_id: null
117
+ wandb_entity: null
118
+ wandb_name: null
119
+ wandb_model_log_interval: -1
120
+ detect_anomaly: false
121
+ pretrain_path: null
122
+ init_param: []
123
+ ignore_init_mismatch: false
124
+ freeze_param:
125
+ - frontend.upstream
126
+ num_iters_per_epoch: null
127
+ batch_size: 20
128
+ valid_batch_size: null
129
+ batch_bins: 10000
130
+ valid_batch_bins: null
131
+ train_shape_file:
132
+ - exp/asr_stats_raw_bpe150_sp/train/speech_shape
133
+ - exp/asr_stats_raw_bpe150_sp/train/text_shape.bpe
134
+ valid_shape_file:
135
+ - exp/asr_stats_raw_bpe150_sp/valid/speech_shape
136
+ - exp/asr_stats_raw_bpe150_sp/valid/text_shape.bpe
137
+ batch_type: numel
138
+ valid_batch_type: null
139
+ fold_length:
140
+ - 80000
141
+ - 150
142
+ sort_in_batch: descending
143
+ sort_batch: descending
144
+ multiple_iterator: false
145
+ chunk_length: 500
146
+ chunk_shift_ratio: 0.5
147
+ num_cache_chunks: 1024
148
+ train_data_path_and_name_and_type:
149
+ - - dump/raw/marathi_train_sp/wav.scp
150
+ - speech
151
+ - sound
152
+ - - dump/raw/marathi_train_sp/text
153
+ - text
154
+ - text
155
+ valid_data_path_and_name_and_type:
156
+ - - dump/raw/marathi_dev/wav.scp
157
+ - speech
158
+ - sound
159
+ - - dump/raw/marathi_dev/text
160
+ - text
161
+ - text
162
+ allow_variable_data_keys: false
163
+ max_cache_size: 0.0
164
+ max_cache_fd: 32
165
+ valid_max_cache_size: null
166
+ optim: adam
167
+ optim_conf:
168
+ lr: 0.0005
169
+ scheduler: warmuplr
170
+ scheduler_conf:
171
+ warmup_steps: 20000
172
+ token_list:
173
+ - <blank>
174
+ - <unk>
175
+ - ▁
176
+ - ा
177
+ - ी
178
+ - े
179
+ - त
180
+ - र
181
+ - ं
182
+ - न
183
+ - क
184
+ - ्
185
+ - व
186
+ - ि
187
+ - ल
188
+ - ▁म
189
+ - स
190
+ - ो
191
+ - श
192
+ - द
193
+ - च
194
+ - म
195
+ - ▁अ
196
+ - ▁आ
197
+ - ण
198
+ - ु
199
+ - ला
200
+ - ह
201
+ - ▁आहे
202
+ - य
203
+ - ▁स
204
+ - ग
205
+ - ▁ह
206
+ - ्या
207
+ - चा
208
+ - ▁प
209
+ - ड
210
+ - ▁क
211
+ - प
212
+ - ट
213
+ - ▁ब
214
+ - ज
215
+ - र्
216
+ - ्र
217
+ - ▁?
218
+ - ▁ज
219
+ - ब
220
+ - ून
221
+ - वा
222
+ - ▁एक
223
+ - ▁या
224
+ - ळ
225
+ - ात
226
+ - ख
227
+ - ध
228
+ - ▁ति
229
+ - ठ
230
+ - ल्या
231
+ - ले
232
+ - ू
233
+ - ▁तुम्हाला
234
+ - ां
235
+ - ार
236
+ - घ
237
+ - ची
238
+ - ▁अस
239
+ - थ
240
+ - ▁का
241
+ - ने
242
+ - णि
243
+ - ॅ
244
+ - ▁त
245
+ - ▁परवा
246
+ - ▁ते
247
+ - ली
248
+ - ▁गेल
249
+ - ळा
250
+ - ष
251
+ - ▁कर
252
+ - .
253
+ - च्या
254
+ - ▁न
255
+ - वर
256
+ - ▁त्या
257
+ - ▁प्र
258
+ - ▁करू
259
+ - ▁ग
260
+ - ्ट
261
+ - ई
262
+ - झ
263
+ - ▁फ
264
+ - ाय
265
+ - क्ष
266
+ - ▁काय
267
+ - पूर
268
+ - ▁होती
269
+ - मध
270
+ - ▁तिथ
271
+ - ▁काही
272
+ - ए
273
+ - ▁वि
274
+ - ▁���ोन
275
+ - ▁महिन्या
276
+ - व्हा
277
+ - तील
278
+ - जार
279
+ - ▁नाही
280
+ - ँ
281
+ - ▁पुत
282
+ - ॉ
283
+ - ▁झाला
284
+ - ▁दिसल
285
+ - ▁साल
286
+ - ▁रस्त्यावर
287
+ - स्त
288
+ - जवळ
289
+ - न्म
290
+ - मध्य
291
+ - ऊ
292
+ - ▁इथे
293
+ - ▁तुमच
294
+ - ▁शकते
295
+ - मान
296
+ - ▁उद्
297
+ - फ
298
+ - ै
299
+ - ढ
300
+ - ','
301
+ - इ
302
+ - ौ
303
+ - ‍
304
+ - ृ
305
+ - ओ
306
+ - ः
307
+ - ॲ
308
+ - आ
309
+ - '-'
310
+ - ञ
311
+ - औ
312
+ - '!'
313
+ - ऑ
314
+ - ऱ
315
+ - ऐ
316
+ - छ
317
+ - उ
318
+ - '?'
319
+ - भ
320
+ - अ
321
+ - ऋ
322
+ - <sos/eos>
323
+ init: xavier_uniform
324
+ input_size: null
325
+ ctc_conf:
326
+ dropout_rate: 0.0
327
+ ctc_type: builtin
328
+ reduce: true
329
+ ignore_nan_grad: true
330
+ joint_net_conf: null
331
+ use_preprocessor: true
332
+ token_type: bpe
333
+ bpemodel: data/token_list/bpe_unigram150/bpe.model
334
+ non_linguistic_symbols: null
335
+ cleaner: null
336
+ g2p: null
337
+ speech_volume_normalize: null
338
+ rir_scp: null
339
+ rir_apply_prob: 1.0
340
+ noise_scp: null
341
+ noise_apply_prob: 1.0
342
+ noise_db_range: '13_15'
343
+ frontend: s3prl
344
+ frontend_conf:
345
+ frontend_conf:
346
+ upstream: wav2vec2_xlsr
347
+ download_dir: ./hub
348
+ multilayer_feature: true
349
+ fs: 16k
350
+ specaug: specaug
351
+ specaug_conf:
352
+ apply_time_warp: true
353
+ time_warp_window: 5
354
+ time_warp_mode: bicubic
355
+ apply_freq_mask: true
356
+ freq_mask_width_range:
357
+ - 0
358
+ - 30
359
+ num_freq_mask: 2
360
+ apply_time_mask: true
361
+ time_mask_width_range:
362
+ - 0
363
+ - 40
364
+ num_time_mask: 2
365
+ normalize: utterance_mvn
366
+ normalize_conf: {}
367
+ model: espnet
368
+ model_conf:
369
+ ctc_weight: 0.3
370
+ lsm_weight: 0.1
371
+ length_normalized_loss: false
372
+ extract_feats_in_collect_stats: false
373
+ preencoder: linear
374
+ preencoder_conf:
375
+ input_size: 1024
376
+ output_size: 80
377
+ encoder: conformer
378
+ encoder_conf:
379
+ output_size: 512
380
+ attention_heads: 4
381
+ linear_units: 1024
382
+ num_blocks: 3
383
+ dropout_rate: 0.3
384
+ positional_dropout_rate: 0.3
385
+ attention_dropout_rate: 0.3
386
+ input_layer: conv2d
387
+ normalize_before: true
388
+ macaron_style: false
389
+ pos_enc_layer_type: rel_pos
390
+ selfattention_layer_type: rel_selfattn
391
+ activation_type: swish
392
+ use_cnn_module: true
393
+ cnn_module_kernel: 17
394
+ postencoder: null
395
+ postencoder_conf: {}
396
+ decoder: transformer
397
+ decoder_conf:
398
+ attention_heads: 4
399
+ linear_units: 1024
400
+ num_blocks: 3
401
+ dropout_rate: 0.3
402
+ positional_dropout_rate: 0.3
403
+ self_attention_dropout_rate: 0.3
404
+ src_attention_dropout_rate: 0.3
405
+ required:
406
+ - output_dir
407
+ - token_list
408
+ version: 0.10.7a1
409
+ distributed: false
410
+ ```
411
+
412
+ </details>
413
+
414
+
415
+
416
+ ### Citing ESPnet
417
+
418
+ ```BibTex
419
+ @inproceedings{watanabe2018espnet,
420
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
421
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
422
+ year={2018},
423
+ booktitle={Proceedings of Interspeech},
424
+ pages={2207--2211},
425
+ doi={10.21437/Interspeech.2018-1456},
426
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
427
+ }
428
+
429
+
430
+
431
+
432
+ ```
433
+
434
+ or arXiv:
435
+
436
+ ```bibtex
437
+ @misc{watanabe2018espnet,
438
+ title={ESPnet: End-to-End Speech Processing Toolkit},
439
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
440
+ year={2018},
441
+ eprint={1804.00015},
442
+ archivePrefix={arXiv},
443
+ primaryClass={cs.CL}
444
+ }
445
+ ```
data/token_list/bpe_unigram150/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39b89f399f16c97fb4c27aa34b1c92bbb6990769002aac0eda38395195eac9a7
3
+ size 239987
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Mon Mar 21 16:06:03 UTC 2022`
5
+ - python version: `3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.7a1`
7
+ - pytorch version: `pytorch 1.11.0+cu102`
8
+ - Git hash: `91325a1e58ca0b13494b94bf79b186b095fe0b58`
9
+ - Commit date: `Mon Mar 21 00:40:52 2022 +0000`
10
+
11
+ ## asr_train_asr_conformer_xlsr_raw_bpe150_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|3625|72.9|22.5|4.7|1.7|28.9|88.6|
17
+
18
+ ### CER
19
+
20
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
21
+ |---|---|---|---|---|---|---|---|---|
22
+ |decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|20557|91.4|3.1|5.5|1.9|10.5|88.6|
23
+
24
+ ### TER
25
+
26
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
27
+ |---|---|---|---|---|---|---|---|---|
28
+ |decode_asr_batch_size1_asr_model_valid.acc.ave/marathi_test|299|13562|86.5|6.3|7.1|1.4|14.9|88.6|
29
+
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/config.yaml ADDED
@@ -0,0 +1,348 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_conformer_xlsr.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 60
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 5
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 3
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ use_wandb: false
54
+ wandb_project: null
55
+ wandb_id: null
56
+ wandb_entity: null
57
+ wandb_name: null
58
+ wandb_model_log_interval: -1
59
+ detect_anomaly: false
60
+ pretrain_path: null
61
+ init_param: []
62
+ ignore_init_mismatch: false
63
+ freeze_param:
64
+ - frontend.upstream
65
+ num_iters_per_epoch: null
66
+ batch_size: 20
67
+ valid_batch_size: null
68
+ batch_bins: 10000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/asr_stats_raw_bpe150_sp/train/speech_shape
72
+ - exp/asr_stats_raw_bpe150_sp/train/text_shape.bpe
73
+ valid_shape_file:
74
+ - exp/asr_stats_raw_bpe150_sp/valid/speech_shape
75
+ - exp/asr_stats_raw_bpe150_sp/valid/text_shape.bpe
76
+ batch_type: numel
77
+ valid_batch_type: null
78
+ fold_length:
79
+ - 80000
80
+ - 150
81
+ sort_in_batch: descending
82
+ sort_batch: descending
83
+ multiple_iterator: false
84
+ chunk_length: 500
85
+ chunk_shift_ratio: 0.5
86
+ num_cache_chunks: 1024
87
+ train_data_path_and_name_and_type:
88
+ - - dump/raw/marathi_train_sp/wav.scp
89
+ - speech
90
+ - sound
91
+ - - dump/raw/marathi_train_sp/text
92
+ - text
93
+ - text
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/raw/marathi_dev/wav.scp
96
+ - speech
97
+ - sound
98
+ - - dump/raw/marathi_dev/text
99
+ - text
100
+ - text
101
+ allow_variable_data_keys: false
102
+ max_cache_size: 0.0
103
+ max_cache_fd: 32
104
+ valid_max_cache_size: null
105
+ optim: adam
106
+ optim_conf:
107
+ lr: 0.0005
108
+ scheduler: warmuplr
109
+ scheduler_conf:
110
+ warmup_steps: 20000
111
+ token_list:
112
+ - <blank>
113
+ - <unk>
114
+ - ▁
115
+ - ा
116
+ - ी
117
+ - े
118
+ - त
119
+ - र
120
+ - ं
121
+ - न
122
+ - क
123
+ - ्
124
+ - व
125
+ - ि
126
+ - ल
127
+ - ▁म
128
+ - स
129
+ - ो
130
+ - श
131
+ - द
132
+ - च
133
+ - म
134
+ - ▁अ
135
+ - ▁आ
136
+ - ण
137
+ - ु
138
+ - ला
139
+ - ह
140
+ - ▁आहे
141
+ - य
142
+ - ▁स
143
+ - ग
144
+ - ▁ह
145
+ - ्या
146
+ - चा
147
+ - ▁प
148
+ - ड
149
+ - ▁क
150
+ - प
151
+ - ट
152
+ - ▁ब
153
+ - ज
154
+ - र्
155
+ - ्र
156
+ - ▁?
157
+ - ▁ज
158
+ - ब
159
+ - ून
160
+ - वा
161
+ - ▁एक
162
+ - ▁या
163
+ - ळ
164
+ - ात
165
+ - ख
166
+ - ध
167
+ - ▁ति
168
+ - ठ
169
+ - ल्या
170
+ - ले
171
+ - ू
172
+ - ▁तुम्हाला
173
+ - ां
174
+ - ार
175
+ - घ
176
+ - ची
177
+ - ▁अस
178
+ - थ
179
+ - ▁का
180
+ - ने
181
+ - णि
182
+ - ॅ
183
+ - ▁त
184
+ - ▁परवा
185
+ - ▁ते
186
+ - ली
187
+ - ▁गेल
188
+ - ळा
189
+ - ष
190
+ - ▁कर
191
+ - .
192
+ - च्या
193
+ - ▁न
194
+ - वर
195
+ - ▁त्या
196
+ - ▁प्र
197
+ - ▁करू
198
+ - ▁ग
199
+ - ्ट
200
+ - ई
201
+ - झ
202
+ - ▁फ
203
+ - ाय
204
+ - क्ष
205
+ - ▁काय
206
+ - पूर
207
+ - ▁होती
208
+ - मध
209
+ - ▁तिथ
210
+ - ▁काही
211
+ - ए
212
+ - ▁वि
213
+ - ▁दोन
214
+ - ▁महिन्या
215
+ - व्हा
216
+ - तील
217
+ - जार
218
+ - ▁नाही
219
+ - ँ
220
+ - ▁पुत
221
+ - ॉ
222
+ - ▁झाला
223
+ - ▁दिसल
224
+ - ▁साल
225
+ - ▁रस्त्यावर
226
+ - स्त
227
+ - जवळ
228
+ - न्म
229
+ - मध्य
230
+ - ऊ
231
+ - ▁इथे
232
+ - ▁तुमच
233
+ - ▁शकते
234
+ - मान
235
+ - ▁उद्
236
+ - फ
237
+ - ै
238
+ - ढ
239
+ - ','
240
+ - इ
241
+ - ौ
242
+ - ‍
243
+ - ृ
244
+ - ओ
245
+ - ः
246
+ - ॲ
247
+ - आ
248
+ - '-'
249
+ - ञ
250
+ - औ
251
+ - '!'
252
+ - ऑ
253
+ - ऱ
254
+ - ऐ
255
+ - छ
256
+ - उ
257
+ - '?'
258
+ - भ
259
+ - अ
260
+ - ऋ
261
+ - <sos/eos>
262
+ init: xavier_uniform
263
+ input_size: null
264
+ ctc_conf:
265
+ dropout_rate: 0.0
266
+ ctc_type: builtin
267
+ reduce: true
268
+ ignore_nan_grad: true
269
+ joint_net_conf: null
270
+ use_preprocessor: true
271
+ token_type: bpe
272
+ bpemodel: data/token_list/bpe_unigram150/bpe.model
273
+ non_linguistic_symbols: null
274
+ cleaner: null
275
+ g2p: null
276
+ speech_volume_normalize: null
277
+ rir_scp: null
278
+ rir_apply_prob: 1.0
279
+ noise_scp: null
280
+ noise_apply_prob: 1.0
281
+ noise_db_range: '13_15'
282
+ frontend: s3prl
283
+ frontend_conf:
284
+ frontend_conf:
285
+ upstream: wav2vec2_xlsr
286
+ download_dir: ./hub
287
+ multilayer_feature: true
288
+ fs: 16k
289
+ specaug: specaug
290
+ specaug_conf:
291
+ apply_time_warp: true
292
+ time_warp_window: 5
293
+ time_warp_mode: bicubic
294
+ apply_freq_mask: true
295
+ freq_mask_width_range:
296
+ - 0
297
+ - 30
298
+ num_freq_mask: 2
299
+ apply_time_mask: true
300
+ time_mask_width_range:
301
+ - 0
302
+ - 40
303
+ num_time_mask: 2
304
+ normalize: utterance_mvn
305
+ normalize_conf: {}
306
+ model: espnet
307
+ model_conf:
308
+ ctc_weight: 0.3
309
+ lsm_weight: 0.1
310
+ length_normalized_loss: false
311
+ extract_feats_in_collect_stats: false
312
+ preencoder: linear
313
+ preencoder_conf:
314
+ input_size: 1024
315
+ output_size: 80
316
+ encoder: conformer
317
+ encoder_conf:
318
+ output_size: 512
319
+ attention_heads: 4
320
+ linear_units: 1024
321
+ num_blocks: 3
322
+ dropout_rate: 0.3
323
+ positional_dropout_rate: 0.3
324
+ attention_dropout_rate: 0.3
325
+ input_layer: conv2d
326
+ normalize_before: true
327
+ macaron_style: false
328
+ pos_enc_layer_type: rel_pos
329
+ selfattention_layer_type: rel_selfattn
330
+ activation_type: swish
331
+ use_cnn_module: true
332
+ cnn_module_kernel: 17
333
+ postencoder: null
334
+ postencoder_conf: {}
335
+ decoder: transformer
336
+ decoder_conf:
337
+ attention_heads: 4
338
+ linear_units: 1024
339
+ num_blocks: 3
340
+ dropout_rate: 0.3
341
+ positional_dropout_rate: 0.3
342
+ self_attention_dropout_rate: 0.3
343
+ src_attention_dropout_rate: 0.3
344
+ required:
345
+ - output_dir
346
+ - token_list
347
+ version: 0.10.7a1
348
+ distributed: false
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/acc.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/backward_time.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/cer.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/cer_ctc.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/forward_time.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/iter_time.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/loss.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/loss_att.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/loss_ctc.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/optim0_lr0.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/optim_step_time.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/train_time.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/images/wer.png ADDED
exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/valid.acc.ave_5best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5113c30441e6ce8d8d21f76fea64d3381066f8736e94c85a0f6393746c8df99b
3
+ size 1376313691
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.7a1
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/valid.acc.ave_5best.pth
4
+ python: "3.9.7 (default, Sep 16 2021, 13:09:58) \n[GCC 7.5.0]"
5
+ timestamp: 1647879711.223404
6
+ torch: 1.11.0+cu102
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_conformer_xlsr_raw_bpe150_sp/config.yaml