dzeinali commited on
Commit
ccf3355
·
1 Parent(s): 32eb4c1

Update model

Browse files
README.md ADDED
@@ -0,0 +1,490 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: de
7
+ datasets:
8
+ - commonvoice
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/german_commonvoice_blstm`
15
+
16
+ This model was trained by dzeinali using commonvoice recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 716eb8f92e19708acfd08ba3bd39d40890d3a84b
23
+ pip install -e .
24
+ cd egs2/commonvoice/asr1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/german_commonvoice_blstm
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Mon Apr 4 16:41:54 EDT 2022`
32
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
33
+ - espnet version: `espnet 0.10.6a1`
34
+ - pytorch version: `pytorch 1.8.1+cu102`
35
+ - Git hash: `fa1b865352475b744c37f70440de1cc6b257ba70`
36
+ - Commit date: `Wed Feb 16 16:42:36 2022 -0500`
37
+
38
+ ## asr_de_blstm_specaug_num_time_mask_2_lr_0.1
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_rnn_asr_model_valid.acc.best/test_de|15341|137512|80.0|18.0|2.0|2.5|22.5|69.9|
44
+
45
+ ### CER
46
+
47
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
48
+ |---|---|---|---|---|---|---|---|---|
49
+ |decode_rnn_asr_model_valid.acc.best/test_de|15341|959619|94.6|3.0|2.3|1.5|6.8|69.9|
50
+
51
+ ### TER
52
+
53
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
54
+ |---|---|---|---|---|---|---|---|---|
55
+ |decode_rnn_asr_model_valid.acc.best/test_de|15341|974965|94.7|3.0|2.3|1.5|6.7|69.9|
56
+
57
+ ## ASR config
58
+
59
+ <details><summary>expand</summary>
60
+
61
+ ```
62
+ config: conf/tuning/train_asr_rnn.yaml
63
+ print_config: false
64
+ log_level: INFO
65
+ dry_run: false
66
+ iterator_type: sequence
67
+ output_dir: exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1
68
+ ngpu: 1
69
+ seed: 0
70
+ num_workers: 1
71
+ num_att_plot: 3
72
+ dist_backend: nccl
73
+ dist_init_method: env://
74
+ dist_world_size: null
75
+ dist_rank: null
76
+ local_rank: 0
77
+ dist_master_addr: null
78
+ dist_master_port: null
79
+ dist_launcher: null
80
+ multiprocessing_distributed: false
81
+ unused_parameters: false
82
+ sharded_ddp: false
83
+ cudnn_enabled: true
84
+ cudnn_benchmark: false
85
+ cudnn_deterministic: true
86
+ collect_stats: false
87
+ write_collected_feats: false
88
+ max_epoch: 15
89
+ patience: 3
90
+ val_scheduler_criterion:
91
+ - valid
92
+ - loss
93
+ early_stopping_criterion:
94
+ - valid
95
+ - loss
96
+ - min
97
+ best_model_criterion:
98
+ - - train
99
+ - loss
100
+ - min
101
+ - - valid
102
+ - loss
103
+ - min
104
+ - - train
105
+ - acc
106
+ - max
107
+ - - valid
108
+ - acc
109
+ - max
110
+ keep_nbest_models:
111
+ - 10
112
+ nbest_averaging_interval: 0
113
+ grad_clip: 5.0
114
+ grad_clip_type: 2.0
115
+ grad_noise: false
116
+ accum_grad: 1
117
+ no_forward_run: false
118
+ resume: true
119
+ train_dtype: float32
120
+ use_amp: false
121
+ log_interval: null
122
+ use_matplotlib: true
123
+ use_tensorboard: true
124
+ use_wandb: false
125
+ wandb_project: null
126
+ wandb_id: null
127
+ wandb_entity: null
128
+ wandb_name: null
129
+ wandb_model_log_interval: -1
130
+ detect_anomaly: false
131
+ pretrain_path: null
132
+ init_param: []
133
+ ignore_init_mismatch: false
134
+ freeze_param: []
135
+ num_iters_per_epoch: null
136
+ batch_size: 30
137
+ valid_batch_size: null
138
+ batch_bins: 1000000
139
+ valid_batch_bins: null
140
+ train_shape_file:
141
+ - exp/asr_stats_raw_de_bpe204_sp/train/speech_shape
142
+ - exp/asr_stats_raw_de_bpe204_sp/train/text_shape.bpe
143
+ valid_shape_file:
144
+ - exp/asr_stats_raw_de_bpe204_sp/valid/speech_shape
145
+ - exp/asr_stats_raw_de_bpe204_sp/valid/text_shape.bpe
146
+ batch_type: folded
147
+ valid_batch_type: null
148
+ fold_length:
149
+ - 80000
150
+ - 150
151
+ sort_in_batch: descending
152
+ sort_batch: descending
153
+ multiple_iterator: false
154
+ chunk_length: 500
155
+ chunk_shift_ratio: 0.5
156
+ num_cache_chunks: 1024
157
+ train_data_path_and_name_and_type:
158
+ - - dump/raw/train_de_sp/wav.scp
159
+ - speech
160
+ - sound
161
+ - - dump/raw/train_de_sp/text
162
+ - text
163
+ - text
164
+ valid_data_path_and_name_and_type:
165
+ - - dump/raw/dev_de/wav.scp
166
+ - speech
167
+ - sound
168
+ - - dump/raw/dev_de/text
169
+ - text
170
+ - text
171
+ allow_variable_data_keys: false
172
+ max_cache_size: 0.0
173
+ max_cache_fd: 32
174
+ valid_max_cache_size: null
175
+ optim: adadelta
176
+ optim_conf:
177
+ lr: 0.1
178
+ scheduler: null
179
+ scheduler_conf: {}
180
+ token_list:
181
+ - <blank>
182
+ - <unk>
183
+ - ▁
184
+ - T
185
+ - S
186
+ - E
187
+ - I
188
+ - R
189
+ - M
190
+ - A
191
+ - N
192
+ - L
193
+ - U
194
+ - D
195
+ - .
196
+ - O
197
+ - H
198
+ - B
199
+ - G
200
+ - F
201
+ - Z
202
+ - K
203
+ - P
204
+ - ü
205
+ - W
206
+ - ','
207
+ - ä
208
+ - V
209
+ - ö
210
+ - J
211
+ - '?'
212
+ - ß
213
+ - '-'
214
+ - Y
215
+ - C
216
+ - '!'
217
+ - '"'
218
+ - X
219
+ - Q
220
+ - “
221
+ - Ä
222
+ - Ö
223
+ - ''''
224
+ - ':'
225
+ - ’
226
+ - –
227
+ - é
228
+ - ;
229
+ - í
230
+ - á
231
+ - ó
232
+ - ō
233
+ - ã
234
+ - š
235
+ - »
236
+ - «
237
+ - ú
238
+ - ‘
239
+ - ł
240
+ - ş
241
+ - ă
242
+ - ř
243
+ - ʻ
244
+ - '&'
245
+ - à
246
+ - ø
247
+ - č
248
+ - ı
249
+ - É
250
+ - ý
251
+ - â
252
+ - ô
253
+ - ū
254
+ - ñ
255
+ - ā
256
+ - ë
257
+ - ž
258
+ - '@'
259
+ - /
260
+ - ʿ
261
+ - ě
262
+ - ī
263
+ - ”
264
+ - ə
265
+ - å
266
+ - ń
267
+ - ′
268
+ - æ
269
+ - ň
270
+ - ś
271
+ - ð
272
+ - ą
273
+ - ė
274
+ - Œ
275
+ - Ç
276
+ - (
277
+ - )
278
+ - ò
279
+ - đ
280
+ - î
281
+ - '='
282
+ - −
283
+ - ů
284
+ - Ú
285
+ - и
286
+ - ġ
287
+ - а
288
+ - ę
289
+ - ›
290
+ - ṣ
291
+ - '`'
292
+ - ì
293
+ - õ
294
+ - ď
295
+ - ť
296
+ - ả
297
+ - —
298
+ - ‹
299
+ - œ
300
+ - ő
301
+ - û
302
+ - ế
303
+ - ф
304
+ - р
305
+ - о
306
+ - м
307
+ - е
308
+ - в
309
+ - С
310
+ - Ḫ
311
+ - ź
312
+ - Î
313
+ - Æ
314
+ - Ż
315
+ - Ś
316
+ - ï
317
+ - Ó
318
+ - Ř
319
+ - ğ
320
+ - Ł
321
+ - İ
322
+ - Đ
323
+ - Ž
324
+ - Ş
325
+ - ț
326
+ - ê
327
+ - Á
328
+ - Ō
329
+ - ́
330
+ - Š
331
+ - Č
332
+ - ć
333
+ - ‚
334
+ - ș
335
+ - „
336
+ - +
337
+ - Ø
338
+ - μ
339
+ - ‐
340
+ - $
341
+ - '['
342
+ - ']'
343
+ - ¡
344
+ - Â
345
+ - Í
346
+ - Ô
347
+ - ù
348
+ - ē
349
+ - Ħ
350
+ - Ī
351
+ - ņ
352
+ - ŏ
353
+ - ż
354
+ - ǐ
355
+ - О
356
+ - Ш
357
+ - к
358
+ - ч
359
+ - ш
360
+ - ་
361
+ - ན
362
+ - ṟ
363
+ - ṭ
364
+ - ạ
365
+ - ắ
366
+ - ễ
367
+ - ộ
368
+ - ‟
369
+ - ≡
370
+ - ⟨
371
+ - ⟩
372
+ - カ
373
+ - 临
374
+ - 孙
375
+ - 尣
376
+ - 支
377
+ - 無
378
+ - 臣
379
+ - →
380
+ - À
381
+ - 道
382
+ - Ü
383
+ - Þ
384
+ - <sos/eos>
385
+ init: null
386
+ input_size: null
387
+ ctc_conf:
388
+ dropout_rate: 0.0
389
+ ctc_type: builtin
390
+ reduce: true
391
+ ignore_nan_grad: true
392
+ joint_net_conf: null
393
+ model_conf:
394
+ ctc_weight: 0.5
395
+ use_preprocessor: true
396
+ token_type: bpe
397
+ bpemodel: data/de_token_list/bpe_unigram204/bpe.model
398
+ non_linguistic_symbols: null
399
+ cleaner: null
400
+ g2p: null
401
+ speech_volume_normalize: null
402
+ rir_scp: null
403
+ rir_apply_prob: 1.0
404
+ noise_scp: null
405
+ noise_apply_prob: 1.0
406
+ noise_db_range: '13_15'
407
+ frontend: default
408
+ frontend_conf:
409
+ fs: 16k
410
+ specaug: specaug
411
+ specaug_conf:
412
+ apply_time_warp: true
413
+ time_warp_window: 5
414
+ time_warp_mode: bicubic
415
+ apply_freq_mask: true
416
+ freq_mask_width_range:
417
+ - 0
418
+ - 27
419
+ num_freq_mask: 2
420
+ apply_time_mask: true
421
+ time_mask_width_ratio_range:
422
+ - 0.0
423
+ - 0.05
424
+ num_time_mask: 2
425
+ normalize: global_mvn
426
+ normalize_conf:
427
+ stats_file: exp/asr_stats_raw_de_bpe204_sp/train/feats_stats.npz
428
+ preencoder: null
429
+ preencoder_conf: {}
430
+ encoder: vgg_rnn
431
+ encoder_conf:
432
+ rnn_type: lstm
433
+ bidirectional: true
434
+ use_projection: true
435
+ num_layers: 4
436
+ hidden_size: 1024
437
+ output_size: 1024
438
+ postencoder: null
439
+ postencoder_conf: {}
440
+ decoder: rnn
441
+ decoder_conf:
442
+ num_layers: 2
443
+ hidden_size: 1024
444
+ sampling_probability: 0
445
+ att_conf:
446
+ atype: location
447
+ adim: 1024
448
+ aconv_chans: 10
449
+ aconv_filts: 100
450
+ required:
451
+ - output_dir
452
+ - token_list
453
+ version: 0.10.6a1
454
+ distributed: false
455
+ ```
456
+
457
+ </details>
458
+
459
+
460
+
461
+ ### Citing ESPnet
462
+
463
+ ```BibTex
464
+ @inproceedings{watanabe2018espnet,
465
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
466
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
467
+ year={2018},
468
+ booktitle={Proceedings of Interspeech},
469
+ pages={2207--2211},
470
+ doi={10.21437/Interspeech.2018-1456},
471
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
472
+ }
473
+
474
+
475
+
476
+
477
+ ```
478
+
479
+ or arXiv:
480
+
481
+ ```bibtex
482
+ @misc{watanabe2018espnet,
483
+ title={ESPnet: End-to-End Speech Processing Toolkit},
484
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
485
+ year={2018},
486
+ eprint={1804.00015},
487
+ archivePrefix={arXiv},
488
+ primaryClass={cs.CL}
489
+ }
490
+ ```
data/de_token_list/bpe_unigram204/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2dc656f400be396b05f913b4398048e456d6d81f1313f41c127ac395cbc27ce1
3
+ size 239929
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/3epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1f8c6f7416de09e27e3959a00d7fbc538ffc8bfbd5b1056dd9a5168541e1f60
3
+ size 448645298
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Mon Apr 4 16:41:54 EDT 2022`
5
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.6a1`
7
+ - pytorch version: `pytorch 1.8.1+cu102`
8
+ - Git hash: `fa1b865352475b744c37f70440de1cc6b257ba70`
9
+ - Commit date: `Wed Feb 16 16:42:36 2022 -0500`
10
+
11
+ ## asr_de_blstm_specaug_num_time_mask_2_lr_0.1
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_rnn_asr_model_valid.acc.best/test_de|15341|137512|80.0|18.0|2.0|2.5|22.5|69.9|
17
+
18
+ ### CER
19
+
20
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
21
+ |---|---|---|---|---|---|---|---|---|
22
+ |decode_rnn_asr_model_valid.acc.best/test_de|15341|959619|94.6|3.0|2.3|1.5|6.8|69.9|
23
+
24
+ ### TER
25
+
26
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
27
+ |---|---|---|---|---|---|---|---|---|
28
+ |decode_rnn_asr_model_valid.acc.best/test_de|15341|974965|94.7|3.0|2.3|1.5|6.7|69.9|
29
+
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/config.yaml ADDED
@@ -0,0 +1,393 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_rnn.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 15
28
+ patience: 3
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - loss
39
+ - min
40
+ - - valid
41
+ - loss
42
+ - min
43
+ - - train
44
+ - acc
45
+ - max
46
+ - - valid
47
+ - acc
48
+ - max
49
+ keep_nbest_models:
50
+ - 10
51
+ nbest_averaging_interval: 0
52
+ grad_clip: 5.0
53
+ grad_clip_type: 2.0
54
+ grad_noise: false
55
+ accum_grad: 1
56
+ no_forward_run: false
57
+ resume: true
58
+ train_dtype: float32
59
+ use_amp: false
60
+ log_interval: null
61
+ use_matplotlib: true
62
+ use_tensorboard: true
63
+ use_wandb: false
64
+ wandb_project: null
65
+ wandb_id: null
66
+ wandb_entity: null
67
+ wandb_name: null
68
+ wandb_model_log_interval: -1
69
+ detect_anomaly: false
70
+ pretrain_path: null
71
+ init_param: []
72
+ ignore_init_mismatch: false
73
+ freeze_param: []
74
+ num_iters_per_epoch: null
75
+ batch_size: 30
76
+ valid_batch_size: null
77
+ batch_bins: 1000000
78
+ valid_batch_bins: null
79
+ train_shape_file:
80
+ - exp/asr_stats_raw_de_bpe204_sp/train/speech_shape
81
+ - exp/asr_stats_raw_de_bpe204_sp/train/text_shape.bpe
82
+ valid_shape_file:
83
+ - exp/asr_stats_raw_de_bpe204_sp/valid/speech_shape
84
+ - exp/asr_stats_raw_de_bpe204_sp/valid/text_shape.bpe
85
+ batch_type: folded
86
+ valid_batch_type: null
87
+ fold_length:
88
+ - 80000
89
+ - 150
90
+ sort_in_batch: descending
91
+ sort_batch: descending
92
+ multiple_iterator: false
93
+ chunk_length: 500
94
+ chunk_shift_ratio: 0.5
95
+ num_cache_chunks: 1024
96
+ train_data_path_and_name_and_type:
97
+ - - dump/raw/train_de_sp/wav.scp
98
+ - speech
99
+ - sound
100
+ - - dump/raw/train_de_sp/text
101
+ - text
102
+ - text
103
+ valid_data_path_and_name_and_type:
104
+ - - dump/raw/dev_de/wav.scp
105
+ - speech
106
+ - sound
107
+ - - dump/raw/dev_de/text
108
+ - text
109
+ - text
110
+ allow_variable_data_keys: false
111
+ max_cache_size: 0.0
112
+ max_cache_fd: 32
113
+ valid_max_cache_size: null
114
+ optim: adadelta
115
+ optim_conf:
116
+ lr: 0.1
117
+ scheduler: null
118
+ scheduler_conf: {}
119
+ token_list:
120
+ - <blank>
121
+ - <unk>
122
+ - ▁
123
+ - T
124
+ - S
125
+ - E
126
+ - I
127
+ - R
128
+ - M
129
+ - A
130
+ - N
131
+ - L
132
+ - U
133
+ - D
134
+ - .
135
+ - O
136
+ - H
137
+ - B
138
+ - G
139
+ - F
140
+ - Z
141
+ - K
142
+ - P
143
+ - ü
144
+ - W
145
+ - ','
146
+ - ä
147
+ - V
148
+ - ö
149
+ - J
150
+ - '?'
151
+ - ß
152
+ - '-'
153
+ - Y
154
+ - C
155
+ - '!'
156
+ - '"'
157
+ - X
158
+ - Q
159
+ - “
160
+ - Ä
161
+ - Ö
162
+ - ''''
163
+ - ':'
164
+ - ’
165
+ - –
166
+ - é
167
+ - ;
168
+ - í
169
+ - á
170
+ - ó
171
+ - ō
172
+ - ã
173
+ - š
174
+ - »
175
+ - «
176
+ - ú
177
+ - ‘
178
+ - ł
179
+ - ş
180
+ - ă
181
+ - ř
182
+ - ʻ
183
+ - '&'
184
+ - à
185
+ - ø
186
+ - č
187
+ - ı
188
+ - É
189
+ - ý
190
+ - â
191
+ - ô
192
+ - ū
193
+ - ñ
194
+ - ā
195
+ - ë
196
+ - ž
197
+ - '@'
198
+ - /
199
+ - ʿ
200
+ - ě
201
+ - ī
202
+ - ”
203
+ - ə
204
+ - å
205
+ - ń
206
+ - ′
207
+ - æ
208
+ - ň
209
+ - ś
210
+ - ð
211
+ - ą
212
+ - ė
213
+ - Œ
214
+ - Ç
215
+ - (
216
+ - )
217
+ - ò
218
+ - đ
219
+ - î
220
+ - '='
221
+ - −
222
+ - ů
223
+ - Ú
224
+ - и
225
+ - ġ
226
+ - а
227
+ - ę
228
+ - ›
229
+ - ṣ
230
+ - '`'
231
+ - ì
232
+ - õ
233
+ - ď
234
+ - ť
235
+ - ả
236
+ - —
237
+ - ‹
238
+ - œ
239
+ - ő
240
+ - û
241
+ - ế
242
+ - ф
243
+ - р
244
+ - о
245
+ - м
246
+ - е
247
+ - в
248
+ - С
249
+ - Ḫ
250
+ - ź
251
+ - Î
252
+ - Æ
253
+ - Ż
254
+ - Ś
255
+ - ï
256
+ - Ó
257
+ - Ř
258
+ - ğ
259
+ - Ł
260
+ - İ
261
+ - Đ
262
+ - Ž
263
+ - Ş
264
+ - ț
265
+ - ê
266
+ - Á
267
+ - Ō
268
+ - ́
269
+ - Š
270
+ - Č
271
+ - ć
272
+ - ‚
273
+ - ș
274
+ - „
275
+ - +
276
+ - Ø
277
+ - μ
278
+ - ‐
279
+ - $
280
+ - '['
281
+ - ']'
282
+ - ¡
283
+ - Â
284
+ - Í
285
+ - Ô
286
+ - ù
287
+ - ē
288
+ - Ħ
289
+ - Ī
290
+ - ņ
291
+ - ŏ
292
+ - ż
293
+ - ǐ
294
+ - О
295
+ - Ш
296
+ - к
297
+ - ч
298
+ - ш
299
+ - ་
300
+ - ན
301
+ - ṟ
302
+ - ṭ
303
+ - ạ
304
+ - ắ
305
+ - ễ
306
+ - ộ
307
+ - ‟
308
+ - ≡
309
+ - ⟨
310
+ - ⟩
311
+ - カ
312
+ - 临
313
+ - 孙
314
+ - 尣
315
+ - 支
316
+ - 無
317
+ - 臣
318
+ - →
319
+ - À
320
+ - 道
321
+ - Ü
322
+ - Þ
323
+ - <sos/eos>
324
+ init: null
325
+ input_size: null
326
+ ctc_conf:
327
+ dropout_rate: 0.0
328
+ ctc_type: builtin
329
+ reduce: true
330
+ ignore_nan_grad: true
331
+ joint_net_conf: null
332
+ model_conf:
333
+ ctc_weight: 0.5
334
+ use_preprocessor: true
335
+ token_type: bpe
336
+ bpemodel: data/de_token_list/bpe_unigram204/bpe.model
337
+ non_linguistic_symbols: null
338
+ cleaner: null
339
+ g2p: null
340
+ speech_volume_normalize: null
341
+ rir_scp: null
342
+ rir_apply_prob: 1.0
343
+ noise_scp: null
344
+ noise_apply_prob: 1.0
345
+ noise_db_range: '13_15'
346
+ frontend: default
347
+ frontend_conf:
348
+ fs: 16k
349
+ specaug: specaug
350
+ specaug_conf:
351
+ apply_time_warp: true
352
+ time_warp_window: 5
353
+ time_warp_mode: bicubic
354
+ apply_freq_mask: true
355
+ freq_mask_width_range:
356
+ - 0
357
+ - 27
358
+ num_freq_mask: 2
359
+ apply_time_mask: true
360
+ time_mask_width_ratio_range:
361
+ - 0.0
362
+ - 0.05
363
+ num_time_mask: 2
364
+ normalize: global_mvn
365
+ normalize_conf:
366
+ stats_file: exp/asr_stats_raw_de_bpe204_sp/train/feats_stats.npz
367
+ preencoder: null
368
+ preencoder_conf: {}
369
+ encoder: vgg_rnn
370
+ encoder_conf:
371
+ rnn_type: lstm
372
+ bidirectional: true
373
+ use_projection: true
374
+ num_layers: 4
375
+ hidden_size: 1024
376
+ output_size: 1024
377
+ postencoder: null
378
+ postencoder_conf: {}
379
+ decoder: rnn
380
+ decoder_conf:
381
+ num_layers: 2
382
+ hidden_size: 1024
383
+ sampling_probability: 0
384
+ att_conf:
385
+ atype: location
386
+ adim: 1024
387
+ aconv_chans: 10
388
+ aconv_filts: 100
389
+ required:
390
+ - output_dir
391
+ - token_list
392
+ version: 0.10.6a1
393
+ distributed: false
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/acc.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/backward_time.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/cer.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/cer_ctc.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/forward_time.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/iter_time.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/loss.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/loss_att.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/loss_ctc.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/optim0_lr0.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/optim_step_time.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/train_time.png ADDED
exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/images/wer.png ADDED
exp/asr_stats_raw_de_bpe204_sp/train/feats_stats.npz ADDED
Binary file (1.4 kB). View file
 
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.6a1
2
+ files:
3
+ asr_model_file: exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/3epoch.pth
4
+ python: "3.9.5 (default, Jun 4 2021, 12:28:51) \n[GCC 7.5.0]"
5
+ timestamp: 1651188301.020166
6
+ torch: 1.8.1+cu102
7
+ yaml_files:
8
+ asr_train_config: exp/asr_de_blstm_specaug_num_time_mask_2_lr_0.1/config.yaml