Automatic Speech Recognition
ESPnet
multilingual
audio
juice500 commited on
Commit
3745d92
·
1 Parent(s): 2e579ff

Upload model

Browse files
Files changed (38) hide show
  1. README.md +486 -0
  2. data/token_list/bpe_unigram150/bpe.model +3 -0
  3. exp/asr_stats_raw_bpe150/train/feats_stats.npz +3 -0
  4. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/RESULTS.md +50 -0
  5. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/config.yaml +366 -0
  6. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/acc.png +0 -0
  7. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/backward_time.png +0 -0
  8. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/cer.png +0 -0
  9. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/cer_ctc.png +0 -0
  10. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/clip.png +0 -0
  11. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/forward_time.png +0 -0
  12. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/gpu_max_cached_mem_GB.png +0 -0
  13. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/grad_norm.png +0 -0
  14. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/iter_time.png +0 -0
  15. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/loss.png +0 -0
  16. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/loss_att.png +0 -0
  17. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/loss_ctc.png +0 -0
  18. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/loss_scale.png +0 -0
  19. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/optim0_lr0.png +0 -0
  20. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/optim_step_time.png +0 -0
  21. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/train_time.png +0 -0
  22. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/wer.png +0 -0
  23. exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/valid.acc.ave_10best.pth +3 -0
  24. exp/lm_train_bpe150/1epoch.pth +3 -0
  25. exp/lm_train_bpe150/config.yaml +282 -0
  26. exp/lm_train_bpe150/images/backward_time.png +0 -0
  27. exp/lm_train_bpe150/images/clip.png +0 -0
  28. exp/lm_train_bpe150/images/forward_time.png +0 -0
  29. exp/lm_train_bpe150/images/gpu_max_cached_mem_GB.png +0 -0
  30. exp/lm_train_bpe150/images/grad_norm.png +0 -0
  31. exp/lm_train_bpe150/images/iter_time.png +0 -0
  32. exp/lm_train_bpe150/images/loss.png +0 -0
  33. exp/lm_train_bpe150/images/loss_scale.png +0 -0
  34. exp/lm_train_bpe150/images/optim0_lr0.png +0 -0
  35. exp/lm_train_bpe150/images/optim_step_time.png +0 -0
  36. exp/lm_train_bpe150/images/train_time.png +0 -0
  37. exp/lm_train_bpe150/perplexity_test/ppl +1 -0
  38. meta.yaml +10 -0
README.md CHANGED
@@ -1,3 +1,489 @@
1
  ---
 
 
 
 
 
 
 
2
  license: cc-by-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: multilingual
7
+ datasets:
8
+ - facebook/multilingual_librispeech
9
  license: cc-by-4.0
10
  ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/juice500ml_mls_10h_asr_ssl`
15
+
16
+ This model was trained by Kwanghee Choi using mls recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout 29d7cb8453486b9073f729866a8cb3d4a8c203bb
26
+ pip install -e .
27
+ cd egs2/mls/asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/juice500ml_mls_10h_asr_ssl
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Fri Oct 20 23:49:47 EDT 2023`
35
+ - python version: `3.8.6 (default, Dec 17 2020, 16:57:01) [GCC 10.2.0]`
36
+ - espnet version: `espnet 202308`
37
+ - pytorch version: `pytorch 1.13.1+cu117`
38
+ - Git hash: `6d5c4220458adc3283838298b549f07dc6aba2ee`
39
+ - Commit date: `Thu Oct 19 16:01:31 2023 -0400`
40
+
41
+ ## exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_de_test|3394|121689|65.4|30.0|4.6|3.5|38.1|99.9|
47
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_en_test|3769|146611|61.5|34.4|4.1|1.9|40.5|100.0|
48
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_es_test|2385|88499|75.5|20.5|4.0|2.9|27.4|99.9|
49
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_fr_test|2426|93167|63.1|31.9|5.0|3.0|39.9|100.0|
50
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_it_test|1262|40847|71.9|23.6|4.5|4.2|32.3|99.8|
51
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_nl_test|3075|127722|65.2|30.0|4.8|3.8|38.6|100.0|
52
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pl_test|520|17034|64.9|29.3|5.8|4.1|39.2|99.8|
53
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pt_test|871|31255|62.4|31.1|6.4|3.9|41.5|100.0|
54
+
55
+ ### CER
56
+
57
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
58
+ |---|---|---|---|---|---|---|---|---|
59
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_de_test|3394|742421|91.8|3.5|4.7|2.2|10.4|99.9|
60
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_en_test|3769|785323|87.3|6.5|6.2|2.6|15.3|100.0|
61
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_es_test|2385|474976|94.7|2.6|2.7|1.7|7.0|99.9|
62
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_fr_test|2426|531607|89.5|4.4|6.2|3.0|13.6|100.0|
63
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_it_test|1262|230831|94.9|2.2|2.9|1.8|6.9|99.8|
64
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_nl_test|3075|698026|92.1|3.2|4.6|2.9|10.8|100.0|
65
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pl_test|520|111718|94.4|2.5|3.1|1.6|7.2|99.8|
66
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pt_test|871|178026|90.5|4.7|4.8|2.3|11.8|100.0|
67
+
68
+ ### TER
69
+
70
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
71
+ |---|---|---|---|---|---|---|---|---|
72
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_de_test|3394|470137|85.5|9.3|5.1|1.9|16.4|99.9|
73
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_en_test|3769|492873|79.4|13.8|6.7|2.6|23.2|100.0|
74
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_es_test|2385|297162|89.4|7.3|3.3|1.6|12.2|99.9|
75
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_fr_test|2426|347607|82.4|10.5|7.1|2.9|20.5|100.0|
76
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_it_test|1262|146439|89.2|6.8|4.0|1.8|12.6|99.8|
77
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_nl_test|3075|438029|85.4|9.7|4.8|2.5|17.1|100.0|
78
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pl_test|520|82933|90.6|6.2|3.2|1.1|10.5|99.8|
79
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pt_test|871|116658|83.4|10.6|6.0|2.4|19.0|100.0|
80
+
81
+ ## ASR config
82
+
83
+ <details><summary>expand</summary>
84
+
85
+ ```
86
+ config: conf/train_asr_e_branchformer1_wavlm_lr1e-4.yaml
87
+ print_config: false
88
+ log_level: INFO
89
+ drop_last_iter: false
90
+ dry_run: false
91
+ iterator_type: sequence
92
+ valid_iterator_type: null
93
+ output_dir: exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150
94
+ ngpu: 1
95
+ seed: 2022
96
+ num_workers: 4
97
+ num_att_plot: 3
98
+ dist_backend: nccl
99
+ dist_init_method: env://
100
+ dist_world_size: null
101
+ dist_rank: null
102
+ local_rank: 0
103
+ dist_master_addr: null
104
+ dist_master_port: null
105
+ dist_launcher: null
106
+ multiprocessing_distributed: false
107
+ unused_parameters: true
108
+ sharded_ddp: false
109
+ cudnn_enabled: true
110
+ cudnn_benchmark: false
111
+ cudnn_deterministic: true
112
+ collect_stats: false
113
+ write_collected_feats: false
114
+ max_epoch: 18
115
+ patience: null
116
+ val_scheduler_criterion:
117
+ - valid
118
+ - loss
119
+ early_stopping_criterion:
120
+ - valid
121
+ - loss
122
+ - min
123
+ best_model_criterion:
124
+ - - valid
125
+ - acc
126
+ - max
127
+ keep_nbest_models: 10
128
+ nbest_averaging_interval: 0
129
+ grad_clip: 5.0
130
+ grad_clip_type: 2.0
131
+ grad_noise: false
132
+ accum_grad: 2
133
+ no_forward_run: false
134
+ resume: true
135
+ train_dtype: float32
136
+ use_amp: true
137
+ log_interval: null
138
+ use_matplotlib: true
139
+ use_tensorboard: true
140
+ create_graph_in_tensorboard: false
141
+ use_wandb: false
142
+ wandb_project: null
143
+ wandb_id: null
144
+ wandb_entity: null
145
+ wandb_name: null
146
+ wandb_model_log_interval: -1
147
+ detect_anomaly: false
148
+ pretrain_path: null
149
+ init_param: []
150
+ ignore_init_mismatch: false
151
+ freeze_param:
152
+ - encoder.encoders
153
+ num_iters_per_epoch: null
154
+ batch_size: 20
155
+ valid_batch_size: null
156
+ batch_bins: 1000000
157
+ valid_batch_bins: null
158
+ train_shape_file:
159
+ - exp/asr_stats_raw_bpe150/train/speech_shape
160
+ - exp/asr_stats_raw_bpe150/train/text_shape.bpe
161
+ valid_shape_file:
162
+ - exp/asr_stats_raw_bpe150/valid/speech_shape
163
+ - exp/asr_stats_raw_bpe150/valid/text_shape.bpe
164
+ batch_type: numel
165
+ valid_batch_type: null
166
+ fold_length:
167
+ - 80000
168
+ - 150
169
+ sort_in_batch: descending
170
+ shuffle_within_batch: false
171
+ sort_batch: descending
172
+ multiple_iterator: false
173
+ chunk_length: 500
174
+ chunk_shift_ratio: 0.5
175
+ num_cache_chunks: 1024
176
+ chunk_excluded_key_prefixes: []
177
+ train_data_path_and_name_and_type:
178
+ - - dump/raw/mls_all_train/wav.scp
179
+ - speech
180
+ - sound
181
+ - - dump/raw/mls_all_train/text
182
+ - text
183
+ - text
184
+ valid_data_path_and_name_and_type:
185
+ - - dump/raw/mls_all_dev/wav.scp
186
+ - speech
187
+ - sound
188
+ - - dump/raw/mls_all_dev/text
189
+ - text
190
+ - text
191
+ allow_variable_data_keys: false
192
+ max_cache_size: 0.0
193
+ max_cache_fd: 32
194
+ valid_max_cache_size: null
195
+ exclude_weight_decay: false
196
+ exclude_weight_decay_conf: {}
197
+ optim: adam
198
+ optim_conf:
199
+ lr: 0.0001
200
+ weight_decay: 1.0e-06
201
+ scheduler: warmuplr
202
+ scheduler_conf:
203
+ warmup_steps: 10000
204
+ token_list:
205
+ - <blank>
206
+ - <unk>
207
+ - ▁
208
+ - s
209
+ - a
210
+ - e
211
+ - o
212
+ - i
213
+ - t
214
+ - u
215
+ - n
216
+ - l
217
+ - r
218
+ - m
219
+ - d
220
+ - g
221
+ - en
222
+ - y
223
+ - f
224
+ - ▁a
225
+ - p
226
+ - ▁p
227
+ - er
228
+ - z
229
+ - ch
230
+ - ▁de
231
+ - ▁e
232
+ - h
233
+ - ▁s
234
+ - b
235
+ - ▁w
236
+ - k
237
+ - c
238
+ - j
239
+ - re
240
+ - w
241
+ - ra
242
+ - te
243
+ - ▁o
244
+ - ar
245
+ - ▁t
246
+ - an
247
+ - ▁z
248
+ - ▁i
249
+ - ie
250
+ - ▁b
251
+ - ro
252
+ - st
253
+ - in
254
+ - ł
255
+ - or
256
+ - v
257
+ - ▁g
258
+ - 'on'
259
+ - é
260
+ - ▁di
261
+ - li
262
+ - ▁d
263
+ - ▁la
264
+ - de
265
+ - ve
266
+ - ri
267
+ - ▁que
268
+ - le
269
+ - ▁h
270
+ - ta
271
+ - ▁ma
272
+ - ''''
273
+ - ci
274
+ - ne
275
+ - ▁un
276
+ - ▁the
277
+ - va
278
+ - it
279
+ - ▁c
280
+ - ▁se
281
+ - ▁da
282
+ - nd
283
+ - ▁no
284
+ - la
285
+ - do
286
+ - ▁m
287
+ - ▁k
288
+ - ▁po
289
+ - ▁in
290
+ - ▁le
291
+ - ▁he
292
+ - ▁si
293
+ - to
294
+ - ę
295
+ - ▁do
296
+ - ▁to
297
+ - ▁ha
298
+ - ce
299
+ - ▁en
300
+ - is
301
+ - ó
302
+ - ▁me
303
+ - ur
304
+ - ▁na
305
+ - ▁mi
306
+ - ni
307
+ - ▁l
308
+ - ▁al
309
+ - da
310
+ - ▁be
311
+ - ti
312
+ - ▁ca
313
+ - me
314
+ - ▁vo
315
+ - ▁so
316
+ - ▁mo
317
+ - ą
318
+ - ▁ge
319
+ - ing
320
+ - ▁and
321
+ - ż
322
+ - q
323
+ - ś
324
+ - á
325
+ - í
326
+ - x
327
+ - ã
328
+ - à
329
+ - ü
330
+ - ć
331
+ - '-'
332
+ - ä
333
+ - ç
334
+ - è
335
+ - ß
336
+ - ê
337
+ - ö
338
+ - ñ
339
+ - ò
340
+ - ú
341
+ - ń
342
+ - ù
343
+ - â
344
+ - ô
345
+ - ì
346
+ - ź
347
+ - õ
348
+ - î
349
+ - û
350
+ - ë
351
+ - ï
352
+ - œ
353
+ - æ
354
+ - <sos/eos>
355
+ init: null
356
+ input_size: null
357
+ ctc_conf:
358
+ dropout_rate: 0.0
359
+ ctc_type: builtin
360
+ reduce: true
361
+ ignore_nan_grad: null
362
+ zero_infinity: true
363
+ joint_net_conf: null
364
+ use_preprocessor: true
365
+ token_type: bpe
366
+ bpemodel: data/token_list/bpe_unigram150/bpe.model
367
+ non_linguistic_symbols: null
368
+ cleaner: null
369
+ g2p: null
370
+ speech_volume_normalize: null
371
+ rir_scp: null
372
+ rir_apply_prob: 1.0
373
+ noise_scp: null
374
+ noise_apply_prob: 1.0
375
+ noise_db_range: '13_15'
376
+ short_noise_thres: 0.5
377
+ aux_ctc_tasks: []
378
+ frontend: s3prl
379
+ frontend_conf:
380
+ frontend_conf:
381
+ upstream: wavlm_large
382
+ download_dir: ./hub
383
+ multilayer_feature: false
384
+ layer: 21
385
+ fs: 16k
386
+ specaug: specaug
387
+ specaug_conf:
388
+ apply_time_warp: true
389
+ time_warp_window: 5
390
+ time_warp_mode: bicubic
391
+ apply_freq_mask: true
392
+ freq_mask_width_range:
393
+ - 0
394
+ - 27
395
+ num_freq_mask: 2
396
+ apply_time_mask: true
397
+ time_mask_width_ratio_range:
398
+ - 0.0
399
+ - 0.05
400
+ num_time_mask: 5
401
+ normalize: utterance_mvn
402
+ normalize_conf: {}
403
+ model: espnet
404
+ model_conf:
405
+ ctc_weight: 0.3
406
+ lsm_weight: 0.1
407
+ length_normalized_loss: false
408
+ preencoder: linear
409
+ preencoder_conf:
410
+ input_size: 1024
411
+ output_size: 128
412
+ encoder: e_branchformer
413
+ encoder_conf:
414
+ output_size: 256
415
+ attention_heads: 4
416
+ attention_layer_type: rel_selfattn
417
+ pos_enc_layer_type: rel_pos
418
+ rel_pos_type: latest
419
+ cgmlp_linear_units: 1024
420
+ cgmlp_conv_kernel: 31
421
+ use_linear_after_conv: false
422
+ gate_activation: identity
423
+ num_blocks: 12
424
+ dropout_rate: 0.1
425
+ positional_dropout_rate: 0.1
426
+ attention_dropout_rate: 0.1
427
+ input_layer: conv2d2
428
+ layer_drop_rate: 0.0
429
+ linear_units: 1024
430
+ positionwise_layer_type: linear
431
+ use_ffn: true
432
+ macaron_ffn: true
433
+ merge_conv_kernel: 31
434
+ postencoder: null
435
+ postencoder_conf: {}
436
+ decoder: transformer
437
+ decoder_conf:
438
+ attention_heads: 4
439
+ linear_units: 2048
440
+ num_blocks: 6
441
+ dropout_rate: 0.1
442
+ positional_dropout_rate: 0.1
443
+ self_attention_dropout_rate: 0.1
444
+ src_attention_dropout_rate: 0.1
445
+ preprocessor: default
446
+ preprocessor_conf: {}
447
+ required:
448
+ - output_dir
449
+ - token_list
450
+ version: '202308'
451
+ distributed: false
452
+ ```
453
+
454
+ </details>
455
+
456
+
457
+
458
+ ### Citing ESPnet
459
+
460
+ ```BibTex
461
+ @inproceedings{watanabe2018espnet,
462
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
463
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
464
+ year={2018},
465
+ booktitle={Proceedings of Interspeech},
466
+ pages={2207--2211},
467
+ doi={10.21437/Interspeech.2018-1456},
468
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
469
+ }
470
+
471
+
472
+
473
+
474
+
475
+
476
+ ```
477
+
478
+ or arXiv:
479
+
480
+ ```bibtex
481
+ @misc{watanabe2018espnet,
482
+ title={ESPnet: End-to-End Speech Processing Toolkit},
483
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
484
+ year={2018},
485
+ eprint={1804.00015},
486
+ archivePrefix={arXiv},
487
+ primaryClass={cs.CL}
488
+ }
489
+ ```
data/token_list/bpe_unigram150/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:402988487286f09251bc722290dce13ab2dc58313b7731ca0ef43e5e5e818578
3
+ size 239437
exp/asr_stats_raw_bpe150/train/feats_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cfd92e2de832f61686290c27df3e35a9c95e1825f108160d1497ec4dc555869
3
+ size 1402
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/RESULTS.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Fri Oct 20 23:49:47 EDT 2023`
5
+ - python version: `3.8.6 (default, Dec 17 2020, 16:57:01) [GCC 10.2.0]`
6
+ - espnet version: `espnet 202308`
7
+ - pytorch version: `pytorch 1.13.1+cu117`
8
+ - Git hash: `6d5c4220458adc3283838298b549f07dc6aba2ee`
9
+ - Commit date: `Thu Oct 19 16:01:31 2023 -0400`
10
+
11
+ ## exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_de_test|3394|121689|65.4|30.0|4.6|3.5|38.1|99.9|
17
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_en_test|3769|146611|61.5|34.4|4.1|1.9|40.5|100.0|
18
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_es_test|2385|88499|75.5|20.5|4.0|2.9|27.4|99.9|
19
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_fr_test|2426|93167|63.1|31.9|5.0|3.0|39.9|100.0|
20
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_it_test|1262|40847|71.9|23.6|4.5|4.2|32.3|99.8|
21
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_nl_test|3075|127722|65.2|30.0|4.8|3.8|38.6|100.0|
22
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pl_test|520|17034|64.9|29.3|5.8|4.1|39.2|99.8|
23
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pt_test|871|31255|62.4|31.1|6.4|3.9|41.5|100.0|
24
+
25
+ ### CER
26
+
27
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
28
+ |---|---|---|---|---|---|---|---|---|
29
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_de_test|3394|742421|91.8|3.5|4.7|2.2|10.4|99.9|
30
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_en_test|3769|785323|87.3|6.5|6.2|2.6|15.3|100.0|
31
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_es_test|2385|474976|94.7|2.6|2.7|1.7|7.0|99.9|
32
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_fr_test|2426|531607|89.5|4.4|6.2|3.0|13.6|100.0|
33
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_it_test|1262|230831|94.9|2.2|2.9|1.8|6.9|99.8|
34
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_nl_test|3075|698026|92.1|3.2|4.6|2.9|10.8|100.0|
35
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pl_test|520|111718|94.4|2.5|3.1|1.6|7.2|99.8|
36
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pt_test|871|178026|90.5|4.7|4.8|2.3|11.8|100.0|
37
+
38
+ ### TER
39
+
40
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
41
+ |---|---|---|---|---|---|---|---|---|
42
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_de_test|3394|470137|85.5|9.3|5.1|1.9|16.4|99.9|
43
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_en_test|3769|492873|79.4|13.8|6.7|2.6|23.2|100.0|
44
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_es_test|2385|297162|89.4|7.3|3.3|1.6|12.2|99.9|
45
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_fr_test|2426|347607|82.4|10.5|7.1|2.9|20.5|100.0|
46
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_it_test|1262|146439|89.2|6.8|4.0|1.8|12.6|99.8|
47
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_nl_test|3075|438029|85.4|9.7|4.8|2.5|17.1|100.0|
48
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pl_test|520|82933|90.6|6.2|3.2|1.1|10.5|99.8|
49
+ |decode_transformer_nolm_lm_lm_train_bpe150_valid.loss.ave_asr_model_valid.acc.ave/mls_pt_test|871|116658|83.4|10.6|6.0|2.4|19.0|100.0|
50
+
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/config.yaml ADDED
@@ -0,0 +1,366 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_asr_e_branchformer1_wavlm_lr1e-4.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150
9
+ ngpu: 1
10
+ seed: 2022
11
+ num_workers: 4
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: null
16
+ dist_rank: null
17
+ local_rank: 0
18
+ dist_master_addr: null
19
+ dist_master_port: null
20
+ dist_launcher: null
21
+ multiprocessing_distributed: false
22
+ unused_parameters: true
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: false
26
+ cudnn_deterministic: true
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 18
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - valid
40
+ - acc
41
+ - max
42
+ keep_nbest_models: 10
43
+ nbest_averaging_interval: 0
44
+ grad_clip: 5.0
45
+ grad_clip_type: 2.0
46
+ grad_noise: false
47
+ accum_grad: 2
48
+ no_forward_run: false
49
+ resume: true
50
+ train_dtype: float32
51
+ use_amp: true
52
+ log_interval: null
53
+ use_matplotlib: true
54
+ use_tensorboard: true
55
+ create_graph_in_tensorboard: false
56
+ use_wandb: false
57
+ wandb_project: null
58
+ wandb_id: null
59
+ wandb_entity: null
60
+ wandb_name: null
61
+ wandb_model_log_interval: -1
62
+ detect_anomaly: false
63
+ pretrain_path: null
64
+ init_param: []
65
+ ignore_init_mismatch: false
66
+ freeze_param:
67
+ - encoder.encoders
68
+ num_iters_per_epoch: null
69
+ batch_size: 20
70
+ valid_batch_size: null
71
+ batch_bins: 1000000
72
+ valid_batch_bins: null
73
+ train_shape_file:
74
+ - exp/asr_stats_raw_bpe150/train/speech_shape
75
+ - exp/asr_stats_raw_bpe150/train/text_shape.bpe
76
+ valid_shape_file:
77
+ - exp/asr_stats_raw_bpe150/valid/speech_shape
78
+ - exp/asr_stats_raw_bpe150/valid/text_shape.bpe
79
+ batch_type: numel
80
+ valid_batch_type: null
81
+ fold_length:
82
+ - 80000
83
+ - 150
84
+ sort_in_batch: descending
85
+ shuffle_within_batch: false
86
+ sort_batch: descending
87
+ multiple_iterator: false
88
+ chunk_length: 500
89
+ chunk_shift_ratio: 0.5
90
+ num_cache_chunks: 1024
91
+ chunk_excluded_key_prefixes: []
92
+ train_data_path_and_name_and_type:
93
+ - - dump/raw/mls_all_train/wav.scp
94
+ - speech
95
+ - sound
96
+ - - dump/raw/mls_all_train/text
97
+ - text
98
+ - text
99
+ valid_data_path_and_name_and_type:
100
+ - - dump/raw/mls_all_dev/wav.scp
101
+ - speech
102
+ - sound
103
+ - - dump/raw/mls_all_dev/text
104
+ - text
105
+ - text
106
+ allow_variable_data_keys: false
107
+ max_cache_size: 0.0
108
+ max_cache_fd: 32
109
+ valid_max_cache_size: null
110
+ exclude_weight_decay: false
111
+ exclude_weight_decay_conf: {}
112
+ optim: adam
113
+ optim_conf:
114
+ lr: 0.0001
115
+ weight_decay: 1.0e-06
116
+ scheduler: warmuplr
117
+ scheduler_conf:
118
+ warmup_steps: 10000
119
+ token_list:
120
+ - <blank>
121
+ - <unk>
122
+ - ▁
123
+ - s
124
+ - a
125
+ - e
126
+ - o
127
+ - i
128
+ - t
129
+ - u
130
+ - n
131
+ - l
132
+ - r
133
+ - m
134
+ - d
135
+ - g
136
+ - en
137
+ - y
138
+ - f
139
+ - ▁a
140
+ - p
141
+ - ▁p
142
+ - er
143
+ - z
144
+ - ch
145
+ - ▁de
146
+ - ▁e
147
+ - h
148
+ - ▁s
149
+ - b
150
+ - ▁w
151
+ - k
152
+ - c
153
+ - j
154
+ - re
155
+ - w
156
+ - ra
157
+ - te
158
+ - ▁o
159
+ - ar
160
+ - ▁t
161
+ - an
162
+ - ▁z
163
+ - ▁i
164
+ - ie
165
+ - ▁b
166
+ - ro
167
+ - st
168
+ - in
169
+ - ł
170
+ - or
171
+ - v
172
+ - ▁g
173
+ - 'on'
174
+ - é
175
+ - ▁di
176
+ - li
177
+ - ▁d
178
+ - ▁la
179
+ - de
180
+ - ve
181
+ - ri
182
+ - ▁que
183
+ - le
184
+ - ▁h
185
+ - ta
186
+ - ▁ma
187
+ - ''''
188
+ - ci
189
+ - ne
190
+ - ▁un
191
+ - ▁the
192
+ - va
193
+ - it
194
+ - ▁c
195
+ - ▁se
196
+ - ▁da
197
+ - nd
198
+ - ▁no
199
+ - la
200
+ - do
201
+ - ▁m
202
+ - ▁k
203
+ - ▁po
204
+ - ▁in
205
+ - ▁le
206
+ - ▁he
207
+ - ▁si
208
+ - to
209
+ - ę
210
+ - ▁do
211
+ - ▁to
212
+ - ▁ha
213
+ - ce
214
+ - ▁en
215
+ - is
216
+ - ó
217
+ - ▁me
218
+ - ur
219
+ - ▁na
220
+ - ▁mi
221
+ - ni
222
+ - ▁l
223
+ - ▁al
224
+ - da
225
+ - ▁be
226
+ - ti
227
+ - ▁ca
228
+ - me
229
+ - ▁vo
230
+ - ▁so
231
+ - ▁mo
232
+ - ą
233
+ - ▁ge
234
+ - ing
235
+ - ▁and
236
+ - ż
237
+ - q
238
+ - ś
239
+ - á
240
+ - í
241
+ - x
242
+ - ã
243
+ - à
244
+ - ü
245
+ - ć
246
+ - '-'
247
+ - ä
248
+ - ç
249
+ - è
250
+ - ß
251
+ - ê
252
+ - ö
253
+ - ñ
254
+ - ò
255
+ - ú
256
+ - ń
257
+ - ù
258
+ - â
259
+ - ô
260
+ - ì
261
+ - ź
262
+ - õ
263
+ - î
264
+ - û
265
+ - ë
266
+ - ï
267
+ - œ
268
+ - æ
269
+ - <sos/eos>
270
+ init: null
271
+ input_size: null
272
+ ctc_conf:
273
+ dropout_rate: 0.0
274
+ ctc_type: builtin
275
+ reduce: true
276
+ ignore_nan_grad: null
277
+ zero_infinity: true
278
+ joint_net_conf: null
279
+ use_preprocessor: true
280
+ token_type: bpe
281
+ bpemodel: data/token_list/bpe_unigram150/bpe.model
282
+ non_linguistic_symbols: null
283
+ cleaner: null
284
+ g2p: null
285
+ speech_volume_normalize: null
286
+ rir_scp: null
287
+ rir_apply_prob: 1.0
288
+ noise_scp: null
289
+ noise_apply_prob: 1.0
290
+ noise_db_range: '13_15'
291
+ short_noise_thres: 0.5
292
+ aux_ctc_tasks: []
293
+ frontend: s3prl
294
+ frontend_conf:
295
+ frontend_conf:
296
+ upstream: wavlm_large
297
+ download_dir: ./hub
298
+ multilayer_feature: false
299
+ layer: 21
300
+ fs: 16k
301
+ specaug: specaug
302
+ specaug_conf:
303
+ apply_time_warp: true
304
+ time_warp_window: 5
305
+ time_warp_mode: bicubic
306
+ apply_freq_mask: true
307
+ freq_mask_width_range:
308
+ - 0
309
+ - 27
310
+ num_freq_mask: 2
311
+ apply_time_mask: true
312
+ time_mask_width_ratio_range:
313
+ - 0.0
314
+ - 0.05
315
+ num_time_mask: 5
316
+ normalize: utterance_mvn
317
+ normalize_conf: {}
318
+ model: espnet
319
+ model_conf:
320
+ ctc_weight: 0.3
321
+ lsm_weight: 0.1
322
+ length_normalized_loss: false
323
+ preencoder: linear
324
+ preencoder_conf:
325
+ input_size: 1024
326
+ output_size: 128
327
+ encoder: e_branchformer
328
+ encoder_conf:
329
+ output_size: 256
330
+ attention_heads: 4
331
+ attention_layer_type: rel_selfattn
332
+ pos_enc_layer_type: rel_pos
333
+ rel_pos_type: latest
334
+ cgmlp_linear_units: 1024
335
+ cgmlp_conv_kernel: 31
336
+ use_linear_after_conv: false
337
+ gate_activation: identity
338
+ num_blocks: 12
339
+ dropout_rate: 0.1
340
+ positional_dropout_rate: 0.1
341
+ attention_dropout_rate: 0.1
342
+ input_layer: conv2d2
343
+ layer_drop_rate: 0.0
344
+ linear_units: 1024
345
+ positionwise_layer_type: linear
346
+ use_ffn: true
347
+ macaron_ffn: true
348
+ merge_conv_kernel: 31
349
+ postencoder: null
350
+ postencoder_conf: {}
351
+ decoder: transformer
352
+ decoder_conf:
353
+ attention_heads: 4
354
+ linear_units: 2048
355
+ num_blocks: 6
356
+ dropout_rate: 0.1
357
+ positional_dropout_rate: 0.1
358
+ self_attention_dropout_rate: 0.1
359
+ src_attention_dropout_rate: 0.1
360
+ preprocessor: default
361
+ preprocessor_conf: {}
362
+ required:
363
+ - output_dir
364
+ - token_list
365
+ version: '202308'
366
+ distributed: false
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/acc.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/backward_time.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/cer.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/cer_ctc.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/clip.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/forward_time.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/grad_norm.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/iter_time.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/loss.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/loss_att.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/loss_ctc.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/loss_scale.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/optim0_lr0.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/optim_step_time.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/train_time.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/images/wer.png ADDED
exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f8f9b1a7371cb3b2335d170e7c5a00b8572e9c17d8366f0e567f2bcf640e891
3
+ size 1412763275
exp/lm_train_bpe150/1epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6a03145153e554d91e040cd4aef56782ebbb2f34ad4f8328534786dfd685064
3
+ size 27864409
exp/lm_train_bpe150/config.yaml ADDED
@@ -0,0 +1,282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: null
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/lm_train_bpe150
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 1
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: null
16
+ dist_rank: null
17
+ local_rank: 0
18
+ dist_master_addr: null
19
+ dist_master_port: null
20
+ dist_launcher: null
21
+ multiprocessing_distributed: false
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: false
26
+ cudnn_deterministic: true
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 40
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - train
40
+ - loss
41
+ - min
42
+ - - valid
43
+ - loss
44
+ - min
45
+ - - train
46
+ - acc
47
+ - max
48
+ - - valid
49
+ - acc
50
+ - max
51
+ keep_nbest_models:
52
+ - 10
53
+ nbest_averaging_interval: 0
54
+ grad_clip: 5.0
55
+ grad_clip_type: 2.0
56
+ grad_noise: false
57
+ accum_grad: 1
58
+ no_forward_run: false
59
+ resume: true
60
+ train_dtype: float32
61
+ use_amp: false
62
+ log_interval: null
63
+ use_matplotlib: true
64
+ use_tensorboard: true
65
+ create_graph_in_tensorboard: false
66
+ use_wandb: false
67
+ wandb_project: null
68
+ wandb_id: null
69
+ wandb_entity: null
70
+ wandb_name: null
71
+ wandb_model_log_interval: -1
72
+ detect_anomaly: false
73
+ pretrain_path: null
74
+ init_param: []
75
+ ignore_init_mismatch: false
76
+ freeze_param: []
77
+ num_iters_per_epoch: null
78
+ batch_size: 20
79
+ valid_batch_size: null
80
+ batch_bins: 1000000
81
+ valid_batch_bins: null
82
+ train_shape_file:
83
+ - exp/lm_stats_bpe150/train/text_shape.bpe
84
+ valid_shape_file:
85
+ - exp/lm_stats_bpe150/valid/text_shape.bpe
86
+ batch_type: folded
87
+ valid_batch_type: null
88
+ fold_length:
89
+ - 150
90
+ sort_in_batch: descending
91
+ shuffle_within_batch: false
92
+ sort_batch: descending
93
+ multiple_iterator: false
94
+ chunk_length: 500
95
+ chunk_shift_ratio: 0.5
96
+ num_cache_chunks: 1024
97
+ chunk_excluded_key_prefixes: []
98
+ train_data_path_and_name_and_type:
99
+ - - dump/raw/lm_train.txt
100
+ - text
101
+ - text
102
+ valid_data_path_and_name_and_type:
103
+ - - dump/raw/org/mls_all_dev/text
104
+ - text
105
+ - text
106
+ allow_variable_data_keys: false
107
+ max_cache_size: 0.0
108
+ max_cache_fd: 32
109
+ valid_max_cache_size: null
110
+ exclude_weight_decay: false
111
+ exclude_weight_decay_conf: {}
112
+ optim: adadelta
113
+ optim_conf: {}
114
+ scheduler: null
115
+ scheduler_conf: {}
116
+ token_list:
117
+ - <blank>
118
+ - <unk>
119
+ - ▁
120
+ - s
121
+ - a
122
+ - e
123
+ - o
124
+ - i
125
+ - t
126
+ - u
127
+ - n
128
+ - l
129
+ - r
130
+ - m
131
+ - d
132
+ - g
133
+ - en
134
+ - y
135
+ - f
136
+ - ▁a
137
+ - p
138
+ - ▁p
139
+ - er
140
+ - z
141
+ - ch
142
+ - ▁de
143
+ - ▁e
144
+ - h
145
+ - ▁s
146
+ - b
147
+ - ▁w
148
+ - k
149
+ - c
150
+ - j
151
+ - re
152
+ - w
153
+ - ra
154
+ - te
155
+ - ▁o
156
+ - ar
157
+ - ▁t
158
+ - an
159
+ - ▁z
160
+ - ▁i
161
+ - ie
162
+ - ▁b
163
+ - ro
164
+ - st
165
+ - in
166
+ - ł
167
+ - or
168
+ - v
169
+ - ▁g
170
+ - 'on'
171
+ - é
172
+ - ▁di
173
+ - li
174
+ - ▁d
175
+ - ▁la
176
+ - de
177
+ - ve
178
+ - ri
179
+ - ▁que
180
+ - le
181
+ - ▁h
182
+ - ta
183
+ - ▁ma
184
+ - ''''
185
+ - ci
186
+ - ne
187
+ - ▁un
188
+ - ▁the
189
+ - va
190
+ - it
191
+ - ▁c
192
+ - ▁se
193
+ - ▁da
194
+ - nd
195
+ - ▁no
196
+ - la
197
+ - do
198
+ - ▁m
199
+ - ▁k
200
+ - ▁po
201
+ - ▁in
202
+ - ▁le
203
+ - ▁he
204
+ - ▁si
205
+ - to
206
+ - ę
207
+ - ▁do
208
+ - ▁to
209
+ - ▁ha
210
+ - ce
211
+ - ▁en
212
+ - is
213
+ - ó
214
+ - ▁me
215
+ - ur
216
+ - ▁na
217
+ - ▁mi
218
+ - ni
219
+ - ▁l
220
+ - ▁al
221
+ - da
222
+ - ▁be
223
+ - ti
224
+ - ▁ca
225
+ - me
226
+ - ▁vo
227
+ - ▁so
228
+ - ▁mo
229
+ - ą
230
+ - ▁ge
231
+ - ing
232
+ - ▁and
233
+ - ż
234
+ - q
235
+ - ś
236
+ - á
237
+ - í
238
+ - x
239
+ - ã
240
+ - à
241
+ - ü
242
+ - ć
243
+ - '-'
244
+ - ä
245
+ - ç
246
+ - è
247
+ - ß
248
+ - ê
249
+ - ö
250
+ - ñ
251
+ - ò
252
+ - ú
253
+ - ń
254
+ - ù
255
+ - â
256
+ - ô
257
+ - ì
258
+ - ź
259
+ - õ
260
+ - î
261
+ - û
262
+ - ë
263
+ - ï
264
+ - œ
265
+ - æ
266
+ - <sos/eos>
267
+ init: null
268
+ model_conf:
269
+ ignore_id: 0
270
+ use_preprocessor: true
271
+ token_type: bpe
272
+ bpemodel: data/token_list/bpe_unigram150/bpe.model
273
+ non_linguistic_symbols: null
274
+ cleaner: null
275
+ g2p: null
276
+ lm: seq_rnn
277
+ lm_conf: {}
278
+ required:
279
+ - output_dir
280
+ - token_list
281
+ version: '202308'
282
+ distributed: false
exp/lm_train_bpe150/images/backward_time.png ADDED
exp/lm_train_bpe150/images/clip.png ADDED
exp/lm_train_bpe150/images/forward_time.png ADDED
exp/lm_train_bpe150/images/gpu_max_cached_mem_GB.png ADDED
exp/lm_train_bpe150/images/grad_norm.png ADDED
exp/lm_train_bpe150/images/iter_time.png ADDED
exp/lm_train_bpe150/images/loss.png ADDED
exp/lm_train_bpe150/images/loss_scale.png ADDED
exp/lm_train_bpe150/images/optim0_lr0.png ADDED
exp/lm_train_bpe150/images/optim_step_time.png ADDED
exp/lm_train_bpe150/images/train_time.png ADDED
exp/lm_train_bpe150/perplexity_test/ppl ADDED
@@ -0,0 +1 @@
 
 
1
+ 54.90240964666942
meta.yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ espnet: '202308'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/valid.acc.ave_10best.pth
4
+ lm_file: exp/lm_train_bpe150/1epoch.pth
5
+ python: "3.8.6 (default, Dec 17 2020, 16:57:01) \n[GCC 10.2.0]"
6
+ timestamp: 1697862352.609481
7
+ torch: 1.13.1+cu117
8
+ yaml_files:
9
+ asr_train_config: exp/asr_train_asr_e_branchformer1_wavlm_lr1e-4_raw_bpe150/config.yaml
10
+ lm_train_config: exp/lm_train_bpe150/config.yaml