ESPnet
English
audio
self-supervised-learning
William Chen commited on
Commit
2786c48
·
1 Parent(s): 444e4cd

Update model

Browse files
README.md ADDED
@@ -0,0 +1,771 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - self-supervised-learning
6
+ language: en
7
+ datasets:
8
+ - librispeech
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 SSL model
13
+
14
+ ### `espnet/hubert_dummy`
15
+
16
+ This model was trained by chen26 using librispeech recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+
26
+ pip install -e .
27
+ cd egs2/librispeech/ssl1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/hubert_dummy
29
+ ```
30
+
31
+
32
+
33
+ ## SSL config
34
+
35
+ <details><summary>expand</summary>
36
+
37
+ ```
38
+ config: conf/tuning/train_hubert_dummy.yaml
39
+ print_config: false
40
+ log_level: INFO
41
+ drop_last_iter: false
42
+ dry_run: false
43
+ iterator_type: sequence
44
+ valid_iterator_type: null
45
+ output_dir: exp/ssl_train_hubert_dummy_raw
46
+ ngpu: 1
47
+ seed: 0
48
+ num_workers: 4
49
+ num_att_plot: 0
50
+ dist_backend: nccl
51
+ dist_init_method: env://
52
+ dist_world_size: null
53
+ dist_rank: null
54
+ local_rank: 0
55
+ dist_master_addr: null
56
+ dist_master_port: null
57
+ dist_launcher: null
58
+ multiprocessing_distributed: false
59
+ unused_parameters: false
60
+ sharded_ddp: false
61
+ use_deepspeed: true
62
+ deepspeed_config: conf/deepspeed.json
63
+ gradient_as_bucket_view: true
64
+ ddp_comm_hook: null
65
+ cudnn_enabled: true
66
+ cudnn_benchmark: false
67
+ cudnn_deterministic: true
68
+ use_tf32: false
69
+ collect_stats: false
70
+ write_collected_feats: false
71
+ max_epoch: 1
72
+ patience: null
73
+ val_scheduler_criterion:
74
+ - valid
75
+ - loss
76
+ early_stopping_criterion:
77
+ - valid
78
+ - loss
79
+ - min
80
+ best_model_criterion:
81
+ - - valid
82
+ - total_count
83
+ - max
84
+ keep_nbest_models: 5
85
+ nbest_averaging_interval: 0
86
+ grad_clip: 5.0
87
+ grad_clip_type: 2.0
88
+ grad_noise: false
89
+ accum_grad: 1
90
+ no_forward_run: false
91
+ resume: true
92
+ train_dtype: float32
93
+ use_amp: false
94
+ log_interval: null
95
+ use_matplotlib: true
96
+ use_tensorboard: true
97
+ create_graph_in_tensorboard: false
98
+ use_wandb: false
99
+ wandb_project: null
100
+ wandb_id: null
101
+ wandb_entity: null
102
+ wandb_name: null
103
+ wandb_model_log_interval: -1
104
+ detect_anomaly: false
105
+ use_adapter: false
106
+ adapter: lora
107
+ save_strategy: all
108
+ adapter_conf: {}
109
+ pretrain_path: null
110
+ init_param: []
111
+ ignore_init_mismatch: false
112
+ freeze_param: []
113
+ num_iters_per_epoch: 10
114
+ batch_size: 20
115
+ valid_batch_size: null
116
+ batch_bins: 16000
117
+ valid_batch_bins: null
118
+ category_sample_size: 10
119
+ train_shape_file:
120
+ - exp/ssl_stats_raw/train/speech_shape
121
+ valid_shape_file:
122
+ - exp/ssl_stats_raw/valid/speech_shape
123
+ batch_type: numel
124
+ valid_batch_type: null
125
+ fold_length:
126
+ - 80000
127
+ - 400
128
+ sort_in_batch: descending
129
+ shuffle_within_batch: false
130
+ sort_batch: descending
131
+ multiple_iterator: false
132
+ chunk_length: 500
133
+ chunk_shift_ratio: 0.5
134
+ num_cache_chunks: 1024
135
+ chunk_excluded_key_prefixes: []
136
+ chunk_default_fs: null
137
+ chunk_max_abs_length: null
138
+ chunk_discard_short_samples: true
139
+ train_data_path_and_name_and_type:
140
+ - - dump/raw/train_960/wav.scp
141
+ - speech
142
+ - sound
143
+ - - dump/raw/train_960/text
144
+ - text
145
+ - text
146
+ valid_data_path_and_name_and_type:
147
+ - - dump/raw/dev/wav.scp
148
+ - speech
149
+ - sound
150
+ - - dump/raw/dev/text
151
+ - text
152
+ - text
153
+ multi_task_dataset: false
154
+ allow_variable_data_keys: false
155
+ max_cache_size: 0.0
156
+ max_cache_fd: 32
157
+ allow_multi_rates: false
158
+ valid_max_cache_size: null
159
+ exclude_weight_decay: false
160
+ exclude_weight_decay_conf: {}
161
+ optim: adadelta
162
+ optim_conf: {}
163
+ scheduler: null
164
+ scheduler_conf: {}
165
+ token_list:
166
+ - '30'
167
+ - '4'
168
+ - '72'
169
+ - '305'
170
+ - '275'
171
+ - '24'
172
+ - '369'
173
+ - '125'
174
+ - '202'
175
+ - '368'
176
+ - '270'
177
+ - '296'
178
+ - '68'
179
+ - '188'
180
+ - '418'
181
+ - '223'
182
+ - '8'
183
+ - '338'
184
+ - '437'
185
+ - '14'
186
+ - '299'
187
+ - '469'
188
+ - '415'
189
+ - '11'
190
+ - '41'
191
+ - '227'
192
+ - '44'
193
+ - '35'
194
+ - '179'
195
+ - '449'
196
+ - '23'
197
+ - '10'
198
+ - '416'
199
+ - '291'
200
+ - '100'
201
+ - '74'
202
+ - '327'
203
+ - '107'
204
+ - '321'
205
+ - '208'
206
+ - '76'
207
+ - '267'
208
+ - '130'
209
+ - '173'
210
+ - '96'
211
+ - '162'
212
+ - '456'
213
+ - '84'
214
+ - '98'
215
+ - '217'
216
+ - '48'
217
+ - '482'
218
+ - '127'
219
+ - '110'
220
+ - '366'
221
+ - '336'
222
+ - '387'
223
+ - '105'
224
+ - '373'
225
+ - '139'
226
+ - '61'
227
+ - '370'
228
+ - '464'
229
+ - '397'
230
+ - '281'
231
+ - '151'
232
+ - '154'
233
+ - '155'
234
+ - '203'
235
+ - '440'
236
+ - '119'
237
+ - '71'
238
+ - '320'
239
+ - '93'
240
+ - '20'
241
+ - '138'
242
+ - '78'
243
+ - '216'
244
+ - '104'
245
+ - '205'
246
+ - '38'
247
+ - '382'
248
+ - '238'
249
+ - '474'
250
+ - '225'
251
+ - '465'
252
+ - '309'
253
+ - '17'
254
+ - '285'
255
+ - '90'
256
+ - '375'
257
+ - '356'
258
+ - '256'
259
+ - '392'
260
+ - '311'
261
+ - '398'
262
+ - '9'
263
+ - '264'
264
+ - '341'
265
+ - '168'
266
+ - '339'
267
+ - '40'
268
+ - '344'
269
+ - '422'
270
+ - '63'
271
+ - '396'
272
+ - '51'
273
+ - '184'
274
+ - '441'
275
+ - '346'
276
+ - '252'
277
+ - '206'
278
+ - '322'
279
+ - '444'
280
+ - '198'
281
+ - '66'
282
+ - '269'
283
+ - '145'
284
+ - '69'
285
+ - '244'
286
+ - '463'
287
+ - '37'
288
+ - '172'
289
+ - '271'
290
+ - '313'
291
+ - '279'
292
+ - '106'
293
+ - '377'
294
+ - '158'
295
+ - '5'
296
+ - '445'
297
+ - '455'
298
+ - '134'
299
+ - '287'
300
+ - '7'
301
+ - '297'
302
+ - '420'
303
+ - '13'
304
+ - '31'
305
+ - '484'
306
+ - '91'
307
+ - '34'
308
+ - '488'
309
+ - '468'
310
+ - '21'
311
+ - '193'
312
+ - '288'
313
+ - '159'
314
+ - '247'
315
+ - '476'
316
+ - '25'
317
+ - '265'
318
+ - '115'
319
+ - '50'
320
+ - '394'
321
+ - '197'
322
+ - '116'
323
+ - '57'
324
+ - '182'
325
+ - '378'
326
+ - '135'
327
+ - '89'
328
+ - '167'
329
+ - '19'
330
+ - '148'
331
+ - '425'
332
+ - '103'
333
+ - '95'
334
+ - '454'
335
+ - '376'
336
+ - '178'
337
+ - '79'
338
+ - '424'
339
+ - '261'
340
+ - '36'
341
+ - '426'
342
+ - '152'
343
+ - '102'
344
+ - '292'
345
+ - '258'
346
+ - '60'
347
+ - '328'
348
+ - '280'
349
+ - '273'
350
+ - '111'
351
+ - '240'
352
+ - '213'
353
+ - '483'
354
+ - '300'
355
+ - '363'
356
+ - '174'
357
+ - '317'
358
+ - '419'
359
+ - '439'
360
+ - '42'
361
+ - '118'
362
+ - '222'
363
+ - '15'
364
+ - '276'
365
+ - '277'
366
+ - '166'
367
+ - '304'
368
+ - '114'
369
+ - '329'
370
+ - '395'
371
+ - '413'
372
+ - '435'
373
+ - '33'
374
+ - '266'
375
+ - '133'
376
+ - '210'
377
+ - '408'
378
+ - '330'
379
+ - '315'
380
+ - '251'
381
+ - '6'
382
+ - '357'
383
+ - '171'
384
+ - '56'
385
+ - '1'
386
+ - '59'
387
+ - '359'
388
+ - '28'
389
+ - '215'
390
+ - '97'
391
+ - '274'
392
+ - '170'
393
+ - '49'
394
+ - '81'
395
+ - '108'
396
+ - '282'
397
+ - '85'
398
+ - '200'
399
+ - '80'
400
+ - '243'
401
+ - '364'
402
+ - '113'
403
+ - '176'
404
+ - '433'
405
+ - '77'
406
+ - '335'
407
+ - '231'
408
+ - '462'
409
+ - '62'
410
+ - '286'
411
+ - '67'
412
+ - '191'
413
+ - '228'
414
+ - '16'
415
+ - '22'
416
+ - '122'
417
+ - '235'
418
+ - '331'
419
+ - '137'
420
+ - '289'
421
+ - '92'
422
+ - '157'
423
+ - '417'
424
+ - '319'
425
+ - '2'
426
+ - '101'
427
+ - '129'
428
+ - '169'
429
+ - '26'
430
+ - '165'
431
+ - '143'
432
+ - '229'
433
+ - '220'
434
+ - '324'
435
+ - '393'
436
+ - '272'
437
+ - '43'
438
+ - '367'
439
+ - '204'
440
+ - '410'
441
+ - '278'
442
+ - '73'
443
+ - '65'
444
+ - '428'
445
+ - '411'
446
+ - '380'
447
+ - '99'
448
+ - '83'
449
+ - '412'
450
+ - '307'
451
+ - '306'
452
+ - '201'
453
+ - '361'
454
+ - '232'
455
+ - '290'
456
+ - '109'
457
+ - '140'
458
+ - '438'
459
+ - '64'
460
+ - '447'
461
+ - '374'
462
+ - '301'
463
+ - '249'
464
+ - '186'
465
+ - '234'
466
+ - '121'
467
+ - '239'
468
+ - '255'
469
+ - '82'
470
+ - '384'
471
+ - '160'
472
+ - '494'
473
+ - '351'
474
+ - '283'
475
+ - '32'
476
+ - '54'
477
+ - '52'
478
+ - '187'
479
+ - '337'
480
+ - '112'
481
+ - '260'
482
+ - '132'
483
+ - '47'
484
+ - '457'
485
+ - '211'
486
+ - '490'
487
+ - '430'
488
+ - '423'
489
+ - '175'
490
+ - '142'
491
+ - '499'
492
+ - '407'
493
+ - '303'
494
+ - '12'
495
+ - '403'
496
+ - '209'
497
+ - '233'
498
+ - '262'
499
+ - '146'
500
+ - '436'
501
+ - '219'
502
+ - '316'
503
+ - '123'
504
+ - '460'
505
+ - '39'
506
+ - '58'
507
+ - '333'
508
+ - '475'
509
+ - '70'
510
+ - '218'
511
+ - '199'
512
+ - '295'
513
+ - '389'
514
+ - '345'
515
+ - '156'
516
+ - '383'
517
+ - '390'
518
+ - '192'
519
+ - '343'
520
+ - '150'
521
+ - '318'
522
+ - '196'
523
+ - '94'
524
+ - '194'
525
+ - '27'
526
+ - '459'
527
+ - '257'
528
+ - '371'
529
+ - '498'
530
+ - '485'
531
+ - '190'
532
+ - '402'
533
+ - '163'
534
+ - '491'
535
+ - '0'
536
+ - '241'
537
+ - '467'
538
+ - '149'
539
+ - '18'
540
+ - '429'
541
+ - '421'
542
+ - '189'
543
+ - '365'
544
+ - '3'
545
+ - '75'
546
+ - '141'
547
+ - '259'
548
+ - '120'
549
+ - '372'
550
+ - '405'
551
+ - '354'
552
+ - '446'
553
+ - '340'
554
+ - '406'
555
+ - '353'
556
+ - '53'
557
+ - '334'
558
+ - '427'
559
+ - '432'
560
+ - '442'
561
+ - '131'
562
+ - '88'
563
+ - '470'
564
+ - '473'
565
+ - '254'
566
+ - '349'
567
+ - '214'
568
+ - '153'
569
+ - '342'
570
+ - '212'
571
+ - '434'
572
+ - '46'
573
+ - '86'
574
+ - '350'
575
+ - '284'
576
+ - '308'
577
+ - '323'
578
+ - '381'
579
+ - '161'
580
+ - '391'
581
+ - '248'
582
+ - '180'
583
+ - '230'
584
+ - '452'
585
+ - '325'
586
+ - '246'
587
+ - '224'
588
+ - '347'
589
+ - '195'
590
+ - '128'
591
+ - '55'
592
+ - '314'
593
+ - '126'
594
+ - '147'
595
+ - '481'
596
+ - '185'
597
+ - '358'
598
+ - '478'
599
+ - '400'
600
+ - '495'
601
+ - '388'
602
+ - '177'
603
+ - '181'
604
+ - '466'
605
+ - '362'
606
+ - '268'
607
+ - '326'
608
+ - '144'
609
+ - '493'
610
+ - '489'
611
+ - '450'
612
+ - '399'
613
+ - '443'
614
+ - '253'
615
+ - '236'
616
+ - '117'
617
+ - '448'
618
+ - '312'
619
+ - '379'
620
+ - '492'
621
+ - '496'
622
+ - '87'
623
+ - '332'
624
+ - '298'
625
+ - '497'
626
+ - '221'
627
+ - '480'
628
+ - '226'
629
+ - '302'
630
+ - '348'
631
+ - '136'
632
+ - '451'
633
+ - '479'
634
+ - '183'
635
+ - '45'
636
+ - '404'
637
+ - '263'
638
+ - '477'
639
+ - '355'
640
+ - '29'
641
+ - '414'
642
+ - '237'
643
+ - '409'
644
+ - '385'
645
+ - '461'
646
+ - '386'
647
+ - '124'
648
+ - '401'
649
+ - '352'
650
+ - '293'
651
+ - '471'
652
+ - '458'
653
+ - '472'
654
+ - '486'
655
+ - '164'
656
+ - '453'
657
+ - '310'
658
+ - '207'
659
+ - '487'
660
+ - '294'
661
+ - '360'
662
+ - '245'
663
+ - '242'
664
+ - '431'
665
+ - '250'
666
+ - <unk>
667
+ - <sos/eos>
668
+ init: null
669
+ collate_fn_conf:
670
+ label_downsampling: 1
671
+ pad: false
672
+ rand_crop: true
673
+ input_size: null
674
+ num_classes: null
675
+ use_preprocessor: true
676
+ token_type: word
677
+ bpemodel: null
678
+ non_linguistic_symbols: null
679
+ cleaner: null
680
+ g2p: null
681
+ speech_volume_normalize: null
682
+ rir_scp: null
683
+ rir_apply_prob: 1.0
684
+ noise_scp: null
685
+ noise_apply_prob: 1.0
686
+ noise_db_range: '13_15'
687
+ window_size: null
688
+ window_shift: null
689
+ loss:
690
+ - name: hubert
691
+ conf:
692
+ num_classes: 500
693
+ final_dim: 2
694
+ util:
695
+ - name: mask
696
+ conf: {}
697
+ frontend: wav2vec_cnn
698
+ frontend_conf:
699
+ norm_mode: group_norm
700
+ conv_mode: standard
701
+ bias: false
702
+ normalize_audio: false
703
+ shapes:
704
+ - - 2
705
+ - 1
706
+ - 10
707
+ fs: 16k
708
+ specaug: null
709
+ specaug_conf: {}
710
+ normalize: null
711
+ normalize_conf: {}
712
+ preencoder: linear
713
+ preencoder_conf:
714
+ output_size: 16
715
+ encoder: transformer
716
+ encoder_conf:
717
+ output_size: 16
718
+ attention_heads: 1
719
+ linear_units: 4
720
+ num_blocks: 2
721
+ dropout_rate: 0.1
722
+ positional_dropout_rate: 0.0
723
+ attention_dropout_rate: 0.1
724
+ input_layer: wav2vec
725
+ normalize_before: false
726
+ pos_enc_layer_type: conv
727
+ model: espnet
728
+ model_conf: {}
729
+ required:
730
+ - output_dir
731
+ - token_list
732
+ version: '202412'
733
+ distributed: false
734
+ ```
735
+
736
+ </details>
737
+
738
+
739
+
740
+ ### Citing ESPnet
741
+
742
+ ```BibTex
743
+ @inproceedings{watanabe2018espnet,
744
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
745
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
746
+ year={2018},
747
+ booktitle={Proceedings of Interspeech},
748
+ pages={2207--2211},
749
+ doi={10.21437/Interspeech.2018-1456},
750
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
751
+ }
752
+
753
+
754
+
755
+
756
+
757
+
758
+ ```
759
+
760
+ or arXiv:
761
+
762
+ ```bibtex
763
+ @misc{watanabe2018espnet,
764
+ title={ESPnet: End-to-End Speech Processing Toolkit},
765
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
766
+ year={2018},
767
+ eprint={1804.00015},
768
+ archivePrefix={arXiv},
769
+ primaryClass={cs.CL}
770
+ }
771
+ ```
exp/ssl_train_hubert_dummy_raw/1epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f609dc030527301b7f25e9aa8b70358a891f25051bb86da0581879e50278f6d
3
+ size 40286
exp/ssl_train_hubert_dummy_raw/config.yaml ADDED
@@ -0,0 +1,696 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_hubert_dummy.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/ssl_train_hubert_dummy_raw
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 4
12
+ num_att_plot: 0
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: null
16
+ dist_rank: null
17
+ local_rank: 0
18
+ dist_master_addr: null
19
+ dist_master_port: null
20
+ dist_launcher: null
21
+ multiprocessing_distributed: false
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ use_deepspeed: true
25
+ deepspeed_config: conf/deepspeed.json
26
+ gradient_as_bucket_view: true
27
+ ddp_comm_hook: null
28
+ cudnn_enabled: true
29
+ cudnn_benchmark: false
30
+ cudnn_deterministic: true
31
+ use_tf32: false
32
+ collect_stats: false
33
+ write_collected_feats: false
34
+ max_epoch: 1
35
+ patience: null
36
+ val_scheduler_criterion:
37
+ - valid
38
+ - loss
39
+ early_stopping_criterion:
40
+ - valid
41
+ - loss
42
+ - min
43
+ best_model_criterion:
44
+ - - valid
45
+ - total_count
46
+ - max
47
+ keep_nbest_models: 5
48
+ nbest_averaging_interval: 0
49
+ grad_clip: 5.0
50
+ grad_clip_type: 2.0
51
+ grad_noise: false
52
+ accum_grad: 1
53
+ no_forward_run: false
54
+ resume: true
55
+ train_dtype: float32
56
+ use_amp: false
57
+ log_interval: null
58
+ use_matplotlib: true
59
+ use_tensorboard: true
60
+ create_graph_in_tensorboard: false
61
+ use_wandb: false
62
+ wandb_project: null
63
+ wandb_id: null
64
+ wandb_entity: null
65
+ wandb_name: null
66
+ wandb_model_log_interval: -1
67
+ detect_anomaly: false
68
+ use_adapter: false
69
+ adapter: lora
70
+ save_strategy: all
71
+ adapter_conf: {}
72
+ pretrain_path: null
73
+ init_param: []
74
+ ignore_init_mismatch: false
75
+ freeze_param: []
76
+ num_iters_per_epoch: 10
77
+ batch_size: 20
78
+ valid_batch_size: null
79
+ batch_bins: 16000
80
+ valid_batch_bins: null
81
+ category_sample_size: 10
82
+ train_shape_file:
83
+ - exp/ssl_stats_raw/train/speech_shape
84
+ valid_shape_file:
85
+ - exp/ssl_stats_raw/valid/speech_shape
86
+ batch_type: numel
87
+ valid_batch_type: null
88
+ fold_length:
89
+ - 80000
90
+ - 400
91
+ sort_in_batch: descending
92
+ shuffle_within_batch: false
93
+ sort_batch: descending
94
+ multiple_iterator: false
95
+ chunk_length: 500
96
+ chunk_shift_ratio: 0.5
97
+ num_cache_chunks: 1024
98
+ chunk_excluded_key_prefixes: []
99
+ chunk_default_fs: null
100
+ chunk_max_abs_length: null
101
+ chunk_discard_short_samples: true
102
+ train_data_path_and_name_and_type:
103
+ - - dump/raw/train_960/wav.scp
104
+ - speech
105
+ - sound
106
+ - - dump/raw/train_960/text
107
+ - text
108
+ - text
109
+ valid_data_path_and_name_and_type:
110
+ - - dump/raw/dev/wav.scp
111
+ - speech
112
+ - sound
113
+ - - dump/raw/dev/text
114
+ - text
115
+ - text
116
+ multi_task_dataset: false
117
+ allow_variable_data_keys: false
118
+ max_cache_size: 0.0
119
+ max_cache_fd: 32
120
+ allow_multi_rates: false
121
+ valid_max_cache_size: null
122
+ exclude_weight_decay: false
123
+ exclude_weight_decay_conf: {}
124
+ optim: adadelta
125
+ optim_conf: {}
126
+ scheduler: null
127
+ scheduler_conf: {}
128
+ token_list:
129
+ - '30'
130
+ - '4'
131
+ - '72'
132
+ - '305'
133
+ - '275'
134
+ - '24'
135
+ - '369'
136
+ - '125'
137
+ - '202'
138
+ - '368'
139
+ - '270'
140
+ - '296'
141
+ - '68'
142
+ - '188'
143
+ - '418'
144
+ - '223'
145
+ - '8'
146
+ - '338'
147
+ - '437'
148
+ - '14'
149
+ - '299'
150
+ - '469'
151
+ - '415'
152
+ - '11'
153
+ - '41'
154
+ - '227'
155
+ - '44'
156
+ - '35'
157
+ - '179'
158
+ - '449'
159
+ - '23'
160
+ - '10'
161
+ - '416'
162
+ - '291'
163
+ - '100'
164
+ - '74'
165
+ - '327'
166
+ - '107'
167
+ - '321'
168
+ - '208'
169
+ - '76'
170
+ - '267'
171
+ - '130'
172
+ - '173'
173
+ - '96'
174
+ - '162'
175
+ - '456'
176
+ - '84'
177
+ - '98'
178
+ - '217'
179
+ - '48'
180
+ - '482'
181
+ - '127'
182
+ - '110'
183
+ - '366'
184
+ - '336'
185
+ - '387'
186
+ - '105'
187
+ - '373'
188
+ - '139'
189
+ - '61'
190
+ - '370'
191
+ - '464'
192
+ - '397'
193
+ - '281'
194
+ - '151'
195
+ - '154'
196
+ - '155'
197
+ - '203'
198
+ - '440'
199
+ - '119'
200
+ - '71'
201
+ - '320'
202
+ - '93'
203
+ - '20'
204
+ - '138'
205
+ - '78'
206
+ - '216'
207
+ - '104'
208
+ - '205'
209
+ - '38'
210
+ - '382'
211
+ - '238'
212
+ - '474'
213
+ - '225'
214
+ - '465'
215
+ - '309'
216
+ - '17'
217
+ - '285'
218
+ - '90'
219
+ - '375'
220
+ - '356'
221
+ - '256'
222
+ - '392'
223
+ - '311'
224
+ - '398'
225
+ - '9'
226
+ - '264'
227
+ - '341'
228
+ - '168'
229
+ - '339'
230
+ - '40'
231
+ - '344'
232
+ - '422'
233
+ - '63'
234
+ - '396'
235
+ - '51'
236
+ - '184'
237
+ - '441'
238
+ - '346'
239
+ - '252'
240
+ - '206'
241
+ - '322'
242
+ - '444'
243
+ - '198'
244
+ - '66'
245
+ - '269'
246
+ - '145'
247
+ - '69'
248
+ - '244'
249
+ - '463'
250
+ - '37'
251
+ - '172'
252
+ - '271'
253
+ - '313'
254
+ - '279'
255
+ - '106'
256
+ - '377'
257
+ - '158'
258
+ - '5'
259
+ - '445'
260
+ - '455'
261
+ - '134'
262
+ - '287'
263
+ - '7'
264
+ - '297'
265
+ - '420'
266
+ - '13'
267
+ - '31'
268
+ - '484'
269
+ - '91'
270
+ - '34'
271
+ - '488'
272
+ - '468'
273
+ - '21'
274
+ - '193'
275
+ - '288'
276
+ - '159'
277
+ - '247'
278
+ - '476'
279
+ - '25'
280
+ - '265'
281
+ - '115'
282
+ - '50'
283
+ - '394'
284
+ - '197'
285
+ - '116'
286
+ - '57'
287
+ - '182'
288
+ - '378'
289
+ - '135'
290
+ - '89'
291
+ - '167'
292
+ - '19'
293
+ - '148'
294
+ - '425'
295
+ - '103'
296
+ - '95'
297
+ - '454'
298
+ - '376'
299
+ - '178'
300
+ - '79'
301
+ - '424'
302
+ - '261'
303
+ - '36'
304
+ - '426'
305
+ - '152'
306
+ - '102'
307
+ - '292'
308
+ - '258'
309
+ - '60'
310
+ - '328'
311
+ - '280'
312
+ - '273'
313
+ - '111'
314
+ - '240'
315
+ - '213'
316
+ - '483'
317
+ - '300'
318
+ - '363'
319
+ - '174'
320
+ - '317'
321
+ - '419'
322
+ - '439'
323
+ - '42'
324
+ - '118'
325
+ - '222'
326
+ - '15'
327
+ - '276'
328
+ - '277'
329
+ - '166'
330
+ - '304'
331
+ - '114'
332
+ - '329'
333
+ - '395'
334
+ - '413'
335
+ - '435'
336
+ - '33'
337
+ - '266'
338
+ - '133'
339
+ - '210'
340
+ - '408'
341
+ - '330'
342
+ - '315'
343
+ - '251'
344
+ - '6'
345
+ - '357'
346
+ - '171'
347
+ - '56'
348
+ - '1'
349
+ - '59'
350
+ - '359'
351
+ - '28'
352
+ - '215'
353
+ - '97'
354
+ - '274'
355
+ - '170'
356
+ - '49'
357
+ - '81'
358
+ - '108'
359
+ - '282'
360
+ - '85'
361
+ - '200'
362
+ - '80'
363
+ - '243'
364
+ - '364'
365
+ - '113'
366
+ - '176'
367
+ - '433'
368
+ - '77'
369
+ - '335'
370
+ - '231'
371
+ - '462'
372
+ - '62'
373
+ - '286'
374
+ - '67'
375
+ - '191'
376
+ - '228'
377
+ - '16'
378
+ - '22'
379
+ - '122'
380
+ - '235'
381
+ - '331'
382
+ - '137'
383
+ - '289'
384
+ - '92'
385
+ - '157'
386
+ - '417'
387
+ - '319'
388
+ - '2'
389
+ - '101'
390
+ - '129'
391
+ - '169'
392
+ - '26'
393
+ - '165'
394
+ - '143'
395
+ - '229'
396
+ - '220'
397
+ - '324'
398
+ - '393'
399
+ - '272'
400
+ - '43'
401
+ - '367'
402
+ - '204'
403
+ - '410'
404
+ - '278'
405
+ - '73'
406
+ - '65'
407
+ - '428'
408
+ - '411'
409
+ - '380'
410
+ - '99'
411
+ - '83'
412
+ - '412'
413
+ - '307'
414
+ - '306'
415
+ - '201'
416
+ - '361'
417
+ - '232'
418
+ - '290'
419
+ - '109'
420
+ - '140'
421
+ - '438'
422
+ - '64'
423
+ - '447'
424
+ - '374'
425
+ - '301'
426
+ - '249'
427
+ - '186'
428
+ - '234'
429
+ - '121'
430
+ - '239'
431
+ - '255'
432
+ - '82'
433
+ - '384'
434
+ - '160'
435
+ - '494'
436
+ - '351'
437
+ - '283'
438
+ - '32'
439
+ - '54'
440
+ - '52'
441
+ - '187'
442
+ - '337'
443
+ - '112'
444
+ - '260'
445
+ - '132'
446
+ - '47'
447
+ - '457'
448
+ - '211'
449
+ - '490'
450
+ - '430'
451
+ - '423'
452
+ - '175'
453
+ - '142'
454
+ - '499'
455
+ - '407'
456
+ - '303'
457
+ - '12'
458
+ - '403'
459
+ - '209'
460
+ - '233'
461
+ - '262'
462
+ - '146'
463
+ - '436'
464
+ - '219'
465
+ - '316'
466
+ - '123'
467
+ - '460'
468
+ - '39'
469
+ - '58'
470
+ - '333'
471
+ - '475'
472
+ - '70'
473
+ - '218'
474
+ - '199'
475
+ - '295'
476
+ - '389'
477
+ - '345'
478
+ - '156'
479
+ - '383'
480
+ - '390'
481
+ - '192'
482
+ - '343'
483
+ - '150'
484
+ - '318'
485
+ - '196'
486
+ - '94'
487
+ - '194'
488
+ - '27'
489
+ - '459'
490
+ - '257'
491
+ - '371'
492
+ - '498'
493
+ - '485'
494
+ - '190'
495
+ - '402'
496
+ - '163'
497
+ - '491'
498
+ - '0'
499
+ - '241'
500
+ - '467'
501
+ - '149'
502
+ - '18'
503
+ - '429'
504
+ - '421'
505
+ - '189'
506
+ - '365'
507
+ - '3'
508
+ - '75'
509
+ - '141'
510
+ - '259'
511
+ - '120'
512
+ - '372'
513
+ - '405'
514
+ - '354'
515
+ - '446'
516
+ - '340'
517
+ - '406'
518
+ - '353'
519
+ - '53'
520
+ - '334'
521
+ - '427'
522
+ - '432'
523
+ - '442'
524
+ - '131'
525
+ - '88'
526
+ - '470'
527
+ - '473'
528
+ - '254'
529
+ - '349'
530
+ - '214'
531
+ - '153'
532
+ - '342'
533
+ - '212'
534
+ - '434'
535
+ - '46'
536
+ - '86'
537
+ - '350'
538
+ - '284'
539
+ - '308'
540
+ - '323'
541
+ - '381'
542
+ - '161'
543
+ - '391'
544
+ - '248'
545
+ - '180'
546
+ - '230'
547
+ - '452'
548
+ - '325'
549
+ - '246'
550
+ - '224'
551
+ - '347'
552
+ - '195'
553
+ - '128'
554
+ - '55'
555
+ - '314'
556
+ - '126'
557
+ - '147'
558
+ - '481'
559
+ - '185'
560
+ - '358'
561
+ - '478'
562
+ - '400'
563
+ - '495'
564
+ - '388'
565
+ - '177'
566
+ - '181'
567
+ - '466'
568
+ - '362'
569
+ - '268'
570
+ - '326'
571
+ - '144'
572
+ - '493'
573
+ - '489'
574
+ - '450'
575
+ - '399'
576
+ - '443'
577
+ - '253'
578
+ - '236'
579
+ - '117'
580
+ - '448'
581
+ - '312'
582
+ - '379'
583
+ - '492'
584
+ - '496'
585
+ - '87'
586
+ - '332'
587
+ - '298'
588
+ - '497'
589
+ - '221'
590
+ - '480'
591
+ - '226'
592
+ - '302'
593
+ - '348'
594
+ - '136'
595
+ - '451'
596
+ - '479'
597
+ - '183'
598
+ - '45'
599
+ - '404'
600
+ - '263'
601
+ - '477'
602
+ - '355'
603
+ - '29'
604
+ - '414'
605
+ - '237'
606
+ - '409'
607
+ - '385'
608
+ - '461'
609
+ - '386'
610
+ - '124'
611
+ - '401'
612
+ - '352'
613
+ - '293'
614
+ - '471'
615
+ - '458'
616
+ - '472'
617
+ - '486'
618
+ - '164'
619
+ - '453'
620
+ - '310'
621
+ - '207'
622
+ - '487'
623
+ - '294'
624
+ - '360'
625
+ - '245'
626
+ - '242'
627
+ - '431'
628
+ - '250'
629
+ - <unk>
630
+ - <sos/eos>
631
+ init: null
632
+ collate_fn_conf:
633
+ label_downsampling: 1
634
+ pad: false
635
+ rand_crop: true
636
+ input_size: null
637
+ num_classes: null
638
+ use_preprocessor: true
639
+ token_type: word
640
+ bpemodel: null
641
+ non_linguistic_symbols: null
642
+ cleaner: null
643
+ g2p: null
644
+ speech_volume_normalize: null
645
+ rir_scp: null
646
+ rir_apply_prob: 1.0
647
+ noise_scp: null
648
+ noise_apply_prob: 1.0
649
+ noise_db_range: '13_15'
650
+ window_size: null
651
+ window_shift: null
652
+ loss:
653
+ - name: hubert
654
+ conf:
655
+ num_classes: 500
656
+ final_dim: 2
657
+ util:
658
+ - name: mask
659
+ conf: {}
660
+ frontend: wav2vec_cnn
661
+ frontend_conf:
662
+ norm_mode: group_norm
663
+ conv_mode: standard
664
+ bias: false
665
+ normalize_audio: false
666
+ shapes:
667
+ - - 2
668
+ - 1
669
+ - 10
670
+ fs: 16k
671
+ specaug: null
672
+ specaug_conf: {}
673
+ normalize: null
674
+ normalize_conf: {}
675
+ preencoder: linear
676
+ preencoder_conf:
677
+ output_size: 16
678
+ encoder: transformer
679
+ encoder_conf:
680
+ output_size: 16
681
+ attention_heads: 1
682
+ linear_units: 4
683
+ num_blocks: 2
684
+ dropout_rate: 0.1
685
+ positional_dropout_rate: 0.0
686
+ attention_dropout_rate: 0.1
687
+ input_layer: wav2vec
688
+ normalize_before: false
689
+ pos_enc_layer_type: conv
690
+ model: espnet
691
+ model_conf: {}
692
+ required:
693
+ - output_dir
694
+ - token_list
695
+ version: '202412'
696
+ distributed: false
exp/ssl_train_hubert_dummy_raw/images/backward_time.png ADDED
exp/ssl_train_hubert_dummy_raw/images/clip.png ADDED
exp/ssl_train_hubert_dummy_raw/images/forward_time.png ADDED
exp/ssl_train_hubert_dummy_raw/images/gpu_max_cached_mem_GB.png ADDED
exp/ssl_train_hubert_dummy_raw/images/grad_norm.png ADDED
exp/ssl_train_hubert_dummy_raw/images/iter_time.png ADDED
exp/ssl_train_hubert_dummy_raw/images/loss.png ADDED
exp/ssl_train_hubert_dummy_raw/images/loss_scale.png ADDED
exp/ssl_train_hubert_dummy_raw/images/optim0_lr0.png ADDED
exp/ssl_train_hubert_dummy_raw/images/optim_step_time.png ADDED
exp/ssl_train_hubert_dummy_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202412'
2
+ files:
3
+ ssl_model_file: exp/ssl_train_hubert_dummy_raw/1epoch.pth
4
+ python: "3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 17:20:25) \n[GCC 12.3.0]"
5
+ timestamp: 1743127152.752043
6
+ torch: 2.6.0+cu126
7
+ yaml_files:
8
+ ssl_train_config: exp/ssl_train_hubert_dummy_raw/config.yaml