ESPnet
audio
classification
Shikhar Bharadwaj commited on
Commit
97f0cd0
·
1 Parent(s): e12a80b

Update model

Browse files
Files changed (19) hide show
  1. README.md +353 -0
  2. meta.yaml +8 -0
  3. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/data/nsynth_pitch/token_list +114 -0
  4. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/4epoch.pth +3 -0
  5. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/RESULTS.md +16 -0
  6. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/config.yaml +301 -0
  7. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/acc.png +0 -0
  8. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/backward_time.png +0 -0
  9. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/clip.png +0 -0
  10. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/forward_time.png +0 -0
  11. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/gpu_max_cached_mem_GB.png +0 -0
  12. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/grad_norm.png +0 -0
  13. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/iter_time.png +0 -0
  14. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/loss.png +0 -0
  15. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/loss_scale.png +0 -0
  16. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/macro_precision.png +0 -0
  17. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/optim0_lr0.png +0 -0
  18. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/optim_step_time.png +0 -0
  19. work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/train_time.png +0 -0
README.md ADDED
@@ -0,0 +1,353 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - classification
6
+ datasets:
7
+ - nsynth
8
+ license: cc-by-4.0
9
+ ---
10
+
11
+ ## ESPnet2 CLS model
12
+
13
+ ### `espnet/OpenBEATS-Large-NsynthPitch`
14
+
15
+ This model was trained by Shikhar Bharadwaj using nsynth recipe in [espnet](https://github.com/espnet/espnet/).
16
+
17
+ ## CLS config
18
+
19
+ <details><summary>expand</summary>
20
+
21
+ ```
22
+ config: /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/earlarge2/conf/ear_large/nsynth_pitch.yaml
23
+ print_config: false
24
+ log_level: INFO
25
+ drop_last_iter: false
26
+ dry_run: false
27
+ iterator_type: sequence
28
+ valid_iterator_type: null
29
+ output_dir: /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2
30
+ ngpu: 1
31
+ seed: 0
32
+ num_workers: 2
33
+ num_att_plot: 0
34
+ dist_backend: nccl
35
+ dist_init_method: env://
36
+ dist_world_size: null
37
+ dist_rank: null
38
+ local_rank: 0
39
+ dist_master_addr: null
40
+ dist_master_port: null
41
+ dist_launcher: null
42
+ multiprocessing_distributed: false
43
+ unused_parameters: true
44
+ sharded_ddp: false
45
+ use_deepspeed: false
46
+ deepspeed_config: null
47
+ gradient_as_bucket_view: true
48
+ ddp_comm_hook: null
49
+ cudnn_enabled: true
50
+ cudnn_benchmark: false
51
+ cudnn_deterministic: true
52
+ use_tf32: false
53
+ collect_stats: false
54
+ write_collected_feats: false
55
+ max_epoch: 30
56
+ patience: null
57
+ val_scheduler_criterion:
58
+ - valid
59
+ - loss
60
+ early_stopping_criterion:
61
+ - valid
62
+ - loss
63
+ - min
64
+ best_model_criterion:
65
+ - - valid
66
+ - acc
67
+ - max
68
+ keep_nbest_models: 1
69
+ nbest_averaging_interval: 0
70
+ grad_clip: 1
71
+ grad_clip_type: 2.0
72
+ grad_noise: false
73
+ accum_grad: 1
74
+ no_forward_run: false
75
+ resume: true
76
+ train_dtype: float32
77
+ use_amp: false
78
+ log_interval: null
79
+ use_matplotlib: true
80
+ use_tensorboard: true
81
+ create_graph_in_tensorboard: false
82
+ use_wandb: true
83
+ wandb_project: audioverse
84
+ wandb_id: null
85
+ wandb_entity: shikhar
86
+ wandb_name: nsynth_pitch.earlarge2
87
+ wandb_model_log_interval: -1
88
+ detect_anomaly: false
89
+ use_adapter: false
90
+ adapter: lora
91
+ save_strategy: all
92
+ adapter_conf: {}
93
+ pretrain_path: null
94
+ init_param: []
95
+ ignore_init_mismatch: false
96
+ freeze_param: []
97
+ num_iters_per_epoch: null
98
+ batch_size: 32
99
+ valid_batch_size: 16
100
+ batch_bins: 1000000
101
+ valid_batch_bins: null
102
+ category_sample_size: 10
103
+ train_shape_file:
104
+ - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_stats_16k/train/speech_shape
105
+ - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_stats_16k/train/label_shape
106
+ valid_shape_file:
107
+ - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_stats_16k/valid/speech_shape
108
+ - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_stats_16k/valid/label_shape
109
+ batch_type: folded
110
+ valid_batch_type: null
111
+ fold_length:
112
+ - 480000
113
+ - 600
114
+ sort_in_batch: descending
115
+ shuffle_within_batch: false
116
+ sort_batch: descending
117
+ multiple_iterator: false
118
+ utt2weight_file: null
119
+ chunk_length: 500
120
+ chunk_shift_ratio: 0.5
121
+ num_cache_chunks: 1024
122
+ chunk_excluded_key_prefixes: []
123
+ chunk_default_fs: null
124
+ chunk_max_abs_length: null
125
+ chunk_discard_short_samples: true
126
+ train_data_path_and_name_and_type:
127
+ - - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/dump/nsynth_pitch/train/wav.scp
128
+ - speech
129
+ - sound
130
+ - - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/dump/nsynth_pitch/train/text
131
+ - label
132
+ - text
133
+ valid_data_path_and_name_and_type:
134
+ - - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/dump/nsynth_pitch/valid/wav.scp
135
+ - speech
136
+ - sound
137
+ - - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/dump/nsynth_pitch/valid/text
138
+ - label
139
+ - text
140
+ multi_task_dataset: false
141
+ allow_variable_data_keys: false
142
+ max_cache_size: 0.0
143
+ max_cache_fd: 32
144
+ allow_multi_rates: false
145
+ valid_max_cache_size: null
146
+ exclude_weight_decay: false
147
+ exclude_weight_decay_conf: {}
148
+ optim: adamw
149
+ optim_conf:
150
+ lr: 3.0e-05
151
+ weight_decay: 0.01
152
+ betas:
153
+ - 0.9
154
+ - 0.98
155
+ scheduler: cosineannealingwarmuprestarts
156
+ scheduler_conf:
157
+ first_cycle_steps: 50000
158
+ warmup_steps: 5000
159
+ max_lr: 3.0e-05
160
+ min_lr: 5.0e-06
161
+ lightning_conf: {}
162
+ token_list:
163
+ - '60'
164
+ - '55'
165
+ - '62'
166
+ - '63'
167
+ - '61'
168
+ - '57'
169
+ - '56'
170
+ - '59'
171
+ - '58'
172
+ - '64'
173
+ - '53'
174
+ - '65'
175
+ - '67'
176
+ - '52'
177
+ - '54'
178
+ - '69'
179
+ - '70'
180
+ - '72'
181
+ - '66'
182
+ - '68'
183
+ - '71'
184
+ - '50'
185
+ - '74'
186
+ - '51'
187
+ - '48'
188
+ - '73'
189
+ - '76'
190
+ - '49'
191
+ - '75'
192
+ - '77'
193
+ - '79'
194
+ - '78'
195
+ - '81'
196
+ - '46'
197
+ - '80'
198
+ - '82'
199
+ - '47'
200
+ - '44'
201
+ - '45'
202
+ - '84'
203
+ - '42'
204
+ - '83'
205
+ - '43'
206
+ - '41'
207
+ - '40'
208
+ - '39'
209
+ - '38'
210
+ - '37'
211
+ - '36'
212
+ - '85'
213
+ - '86'
214
+ - '35'
215
+ - '34'
216
+ - '33'
217
+ - '32'
218
+ - '31'
219
+ - '88'
220
+ - '87'
221
+ - '30'
222
+ - '89'
223
+ - '29'
224
+ - '28'
225
+ - '27'
226
+ - '25'
227
+ - '26'
228
+ - '24'
229
+ - '91'
230
+ - '90'
231
+ - '92'
232
+ - '93'
233
+ - '94'
234
+ - '95'
235
+ - '96'
236
+ - '23'
237
+ - '22'
238
+ - '97'
239
+ - '98'
240
+ - '99'
241
+ - '100'
242
+ - '21'
243
+ - '101'
244
+ - '103'
245
+ - '102'
246
+ - '104'
247
+ - '105'
248
+ - '106'
249
+ - '108'
250
+ - '107'
251
+ - '17'
252
+ - '19'
253
+ - '16'
254
+ - '12'
255
+ - '10'
256
+ - '18'
257
+ - '14'
258
+ - '20'
259
+ - '11'
260
+ - '13'
261
+ - '9'
262
+ - '15'
263
+ - '109'
264
+ - '110'
265
+ - '115'
266
+ - '114'
267
+ - '117'
268
+ - '111'
269
+ - '112'
270
+ - '118'
271
+ - '116'
272
+ - '113'
273
+ - '120'
274
+ - '119'
275
+ - <blank>
276
+ - <unk>
277
+ text_token_list: null
278
+ text_bpemodel: null
279
+ init: xavier_normal
280
+ input_size: 1
281
+ use_preprocessor: true
282
+ frontend: null
283
+ frontend_conf: {}
284
+ specaug: null
285
+ specaug_conf: {}
286
+ normalize: null
287
+ normalize_conf: {}
288
+ preencoder: null
289
+ preencoder_conf: {}
290
+ encoder: beats
291
+ encoder_conf:
292
+ beats_ckpt_path: /work/nvme/bbjs/sbharadwaj/7Msounds/exp/beats_iter1_large1.tune_lr1.0e-4_warmup40000_bins1600000_totalsteps400000/epoch_latest.pt
293
+ beats_config:
294
+ layer_wise_gradient_decay_ratio: 0.3
295
+ encoder_layerdrop: 0.1
296
+ dropout: 0.0
297
+ use_weighted_representation: false
298
+ specaug_config:
299
+ apply_time_warp: true
300
+ apply_freq_mask: false
301
+ apply_time_mask: true
302
+ time_mask_width_ratio_range:
303
+ - 0
304
+ - 0.06
305
+ num_time_mask: 1
306
+ roll_augment: true
307
+ roll_interval: 1
308
+ text_encoder: null
309
+ text_encoder_conf: {}
310
+ embedding_fusion: null
311
+ embedding_fusion_conf: {}
312
+ decoder: linear
313
+ decoder_conf: {}
314
+ model: espnet
315
+ model_conf:
316
+ classification_type: multi-class
317
+ lsm_weight: 0.1
318
+ required:
319
+ - output_dir
320
+ - token_list
321
+ version: '202412'
322
+ distributed: false
323
+ ```
324
+
325
+ </details>
326
+
327
+ ### Citations
328
+
329
+ ```BibTex
330
+
331
+ @article{bharadwaj2025openbeats,
332
+ title={OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder},
333
+ author={Bharadwaj, Shikhar and Cornell, Samuele and Choi, Kwanghee and Fukayama, Satoru and Shim, Hye-jin and Deshmukh, Soham and Watanabe, Shinji},
334
+ journal={arXiv preprint arXiv:2507.14129},
335
+ year={2025}
336
+ }
337
+
338
+ @inproceedings{watanabe2018espnet,
339
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
340
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
341
+ year={2018},
342
+ booktitle={Proceedings of Interspeech},
343
+ pages={2207--2211},
344
+ doi={10.21437/Interspeech.2018-1456},
345
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
346
+ }
347
+
348
+
349
+
350
+
351
+
352
+
353
+ ```
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202503'
2
+ files:
3
+ classification_model_file: /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/4epoch.pth
4
+ python: "3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 17:20:25) \n[GCC 12.3.0]"
5
+ timestamp: 1763334440.419186
6
+ torch: 2.1.2
7
+ yaml_files:
8
+ classification_train_config: /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/config.yaml
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/data/nsynth_pitch/token_list ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 60
2
+ 55
3
+ 62
4
+ 63
5
+ 61
6
+ 57
7
+ 56
8
+ 59
9
+ 58
10
+ 64
11
+ 53
12
+ 65
13
+ 67
14
+ 52
15
+ 54
16
+ 69
17
+ 70
18
+ 72
19
+ 66
20
+ 68
21
+ 71
22
+ 50
23
+ 74
24
+ 51
25
+ 48
26
+ 73
27
+ 76
28
+ 49
29
+ 75
30
+ 77
31
+ 79
32
+ 78
33
+ 81
34
+ 46
35
+ 80
36
+ 82
37
+ 47
38
+ 44
39
+ 45
40
+ 84
41
+ 42
42
+ 83
43
+ 43
44
+ 41
45
+ 40
46
+ 39
47
+ 38
48
+ 37
49
+ 36
50
+ 85
51
+ 86
52
+ 35
53
+ 34
54
+ 33
55
+ 32
56
+ 31
57
+ 88
58
+ 87
59
+ 30
60
+ 89
61
+ 29
62
+ 28
63
+ 27
64
+ 25
65
+ 26
66
+ 24
67
+ 91
68
+ 90
69
+ 92
70
+ 93
71
+ 94
72
+ 95
73
+ 96
74
+ 23
75
+ 22
76
+ 97
77
+ 98
78
+ 99
79
+ 100
80
+ 21
81
+ 101
82
+ 103
83
+ 102
84
+ 104
85
+ 105
86
+ 106
87
+ 108
88
+ 107
89
+ 17
90
+ 19
91
+ 16
92
+ 12
93
+ 10
94
+ 18
95
+ 14
96
+ 20
97
+ 11
98
+ 13
99
+ 9
100
+ 15
101
+ 109
102
+ 110
103
+ 115
104
+ 114
105
+ 117
106
+ 111
107
+ 112
108
+ 118
109
+ 116
110
+ 113
111
+ 120
112
+ 119
113
+ <blank>
114
+ <unk>
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/4epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0fdc0af0cfb47d06b800b2707512d4f4301ef1a0063ce49ffacb61694f386b3b
3
+ size 1246148137
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/RESULTS.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_cls_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Tue Apr 22 03:10:36 CDT 2025`
5
+ - python version: `3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 17:20:25) [GCC 12.3.0]`
6
+ - espnet version: `espnet 202412`
7
+ - pytorch version: `pytorch 2.6.0.dev20241210+cu124`
8
+ - Git hash: `c96433a43c5c3984889b81804becac6ebf10f7a7`
9
+ - Commit date: `Mon Mar 31 20:24:06 2025 -0500`
10
+
11
+ ## cls_earlarge2
12
+ |Split|mean_acc|mAP|mean_auc|n_labels|n_instances|
13
+ |---|---|---|---|---|---|
14
+ cls_test|92.65|88.43|94.46|112.00|4096.00
15
+ cls_valid|92.73|89.00|99.86|112.00|12678.00
16
+
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/config.yaml ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/earlarge2/conf/ear_large/nsynth_pitch.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 2
12
+ num_att_plot: 0
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: null
16
+ dist_rank: null
17
+ local_rank: 0
18
+ dist_master_addr: null
19
+ dist_master_port: null
20
+ dist_launcher: null
21
+ multiprocessing_distributed: false
22
+ unused_parameters: true
23
+ sharded_ddp: false
24
+ use_deepspeed: false
25
+ deepspeed_config: null
26
+ gradient_as_bucket_view: true
27
+ ddp_comm_hook: null
28
+ cudnn_enabled: true
29
+ cudnn_benchmark: false
30
+ cudnn_deterministic: true
31
+ use_tf32: false
32
+ collect_stats: false
33
+ write_collected_feats: false
34
+ max_epoch: 30
35
+ patience: null
36
+ val_scheduler_criterion:
37
+ - valid
38
+ - loss
39
+ early_stopping_criterion:
40
+ - valid
41
+ - loss
42
+ - min
43
+ best_model_criterion:
44
+ - - valid
45
+ - acc
46
+ - max
47
+ keep_nbest_models: 1
48
+ nbest_averaging_interval: 0
49
+ grad_clip: 1
50
+ grad_clip_type: 2.0
51
+ grad_noise: false
52
+ accum_grad: 1
53
+ no_forward_run: false
54
+ resume: true
55
+ train_dtype: float32
56
+ use_amp: false
57
+ log_interval: null
58
+ use_matplotlib: true
59
+ use_tensorboard: true
60
+ create_graph_in_tensorboard: false
61
+ use_wandb: true
62
+ wandb_project: audioverse
63
+ wandb_id: null
64
+ wandb_entity: shikhar
65
+ wandb_name: nsynth_pitch.earlarge2
66
+ wandb_model_log_interval: -1
67
+ detect_anomaly: false
68
+ use_adapter: false
69
+ adapter: lora
70
+ save_strategy: all
71
+ adapter_conf: {}
72
+ pretrain_path: null
73
+ init_param: []
74
+ ignore_init_mismatch: false
75
+ freeze_param: []
76
+ num_iters_per_epoch: null
77
+ batch_size: 32
78
+ valid_batch_size: 16
79
+ batch_bins: 1000000
80
+ valid_batch_bins: null
81
+ category_sample_size: 10
82
+ train_shape_file:
83
+ - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_stats_16k/train/speech_shape
84
+ - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_stats_16k/train/label_shape
85
+ valid_shape_file:
86
+ - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_stats_16k/valid/speech_shape
87
+ - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_stats_16k/valid/label_shape
88
+ batch_type: folded
89
+ valid_batch_type: null
90
+ fold_length:
91
+ - 480000
92
+ - 600
93
+ sort_in_batch: descending
94
+ shuffle_within_batch: false
95
+ sort_batch: descending
96
+ multiple_iterator: false
97
+ utt2weight_file: null
98
+ chunk_length: 500
99
+ chunk_shift_ratio: 0.5
100
+ num_cache_chunks: 1024
101
+ chunk_excluded_key_prefixes: []
102
+ chunk_default_fs: null
103
+ chunk_max_abs_length: null
104
+ chunk_discard_short_samples: true
105
+ train_data_path_and_name_and_type:
106
+ - - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/dump/nsynth_pitch/train/wav.scp
107
+ - speech
108
+ - sound
109
+ - - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/dump/nsynth_pitch/train/text
110
+ - label
111
+ - text
112
+ valid_data_path_and_name_and_type:
113
+ - - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/dump/nsynth_pitch/valid/wav.scp
114
+ - speech
115
+ - sound
116
+ - - /work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/dump/nsynth_pitch/valid/text
117
+ - label
118
+ - text
119
+ multi_task_dataset: false
120
+ allow_variable_data_keys: false
121
+ max_cache_size: 0.0
122
+ max_cache_fd: 32
123
+ allow_multi_rates: false
124
+ valid_max_cache_size: null
125
+ exclude_weight_decay: false
126
+ exclude_weight_decay_conf: {}
127
+ optim: adamw
128
+ optim_conf:
129
+ lr: 3.0e-05
130
+ weight_decay: 0.01
131
+ betas:
132
+ - 0.9
133
+ - 0.98
134
+ scheduler: cosineannealingwarmuprestarts
135
+ scheduler_conf:
136
+ first_cycle_steps: 50000
137
+ warmup_steps: 5000
138
+ max_lr: 3.0e-05
139
+ min_lr: 5.0e-06
140
+ lightning_conf: {}
141
+ token_list:
142
+ - '60'
143
+ - '55'
144
+ - '62'
145
+ - '63'
146
+ - '61'
147
+ - '57'
148
+ - '56'
149
+ - '59'
150
+ - '58'
151
+ - '64'
152
+ - '53'
153
+ - '65'
154
+ - '67'
155
+ - '52'
156
+ - '54'
157
+ - '69'
158
+ - '70'
159
+ - '72'
160
+ - '66'
161
+ - '68'
162
+ - '71'
163
+ - '50'
164
+ - '74'
165
+ - '51'
166
+ - '48'
167
+ - '73'
168
+ - '76'
169
+ - '49'
170
+ - '75'
171
+ - '77'
172
+ - '79'
173
+ - '78'
174
+ - '81'
175
+ - '46'
176
+ - '80'
177
+ - '82'
178
+ - '47'
179
+ - '44'
180
+ - '45'
181
+ - '84'
182
+ - '42'
183
+ - '83'
184
+ - '43'
185
+ - '41'
186
+ - '40'
187
+ - '39'
188
+ - '38'
189
+ - '37'
190
+ - '36'
191
+ - '85'
192
+ - '86'
193
+ - '35'
194
+ - '34'
195
+ - '33'
196
+ - '32'
197
+ - '31'
198
+ - '88'
199
+ - '87'
200
+ - '30'
201
+ - '89'
202
+ - '29'
203
+ - '28'
204
+ - '27'
205
+ - '25'
206
+ - '26'
207
+ - '24'
208
+ - '91'
209
+ - '90'
210
+ - '92'
211
+ - '93'
212
+ - '94'
213
+ - '95'
214
+ - '96'
215
+ - '23'
216
+ - '22'
217
+ - '97'
218
+ - '98'
219
+ - '99'
220
+ - '100'
221
+ - '21'
222
+ - '101'
223
+ - '103'
224
+ - '102'
225
+ - '104'
226
+ - '105'
227
+ - '106'
228
+ - '108'
229
+ - '107'
230
+ - '17'
231
+ - '19'
232
+ - '16'
233
+ - '12'
234
+ - '10'
235
+ - '18'
236
+ - '14'
237
+ - '20'
238
+ - '11'
239
+ - '13'
240
+ - '9'
241
+ - '15'
242
+ - '109'
243
+ - '110'
244
+ - '115'
245
+ - '114'
246
+ - '117'
247
+ - '111'
248
+ - '112'
249
+ - '118'
250
+ - '116'
251
+ - '113'
252
+ - '120'
253
+ - '119'
254
+ - <blank>
255
+ - <unk>
256
+ text_token_list: null
257
+ text_bpemodel: null
258
+ init: xavier_normal
259
+ input_size: 1
260
+ use_preprocessor: true
261
+ frontend: null
262
+ frontend_conf: {}
263
+ specaug: null
264
+ specaug_conf: {}
265
+ normalize: null
266
+ normalize_conf: {}
267
+ preencoder: null
268
+ preencoder_conf: {}
269
+ encoder: beats
270
+ encoder_conf:
271
+ beats_ckpt_path: /work/nvme/bbjs/sbharadwaj/7Msounds/exp/beats_iter1_large1.tune_lr1.0e-4_warmup40000_bins1600000_totalsteps400000/epoch_latest.pt
272
+ beats_config:
273
+ layer_wise_gradient_decay_ratio: 0.3
274
+ encoder_layerdrop: 0.1
275
+ dropout: 0.0
276
+ use_weighted_representation: false
277
+ specaug_config:
278
+ apply_time_warp: true
279
+ apply_freq_mask: false
280
+ apply_time_mask: true
281
+ time_mask_width_ratio_range:
282
+ - 0
283
+ - 0.06
284
+ num_time_mask: 1
285
+ roll_augment: true
286
+ roll_interval: 1
287
+ text_encoder: null
288
+ text_encoder_conf: {}
289
+ embedding_fusion: null
290
+ embedding_fusion_conf: {}
291
+ decoder: linear
292
+ decoder_conf: {}
293
+ model: espnet
294
+ model_conf:
295
+ classification_type: multi-class
296
+ lsm_weight: 0.1
297
+ required:
298
+ - output_dir
299
+ - token_list
300
+ version: '202412'
301
+ distributed: false
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/acc.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/backward_time.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/clip.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/forward_time.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/gpu_max_cached_mem_GB.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/grad_norm.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/iter_time.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/loss.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/loss_scale.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/macro_precision.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/optim0_lr0.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/optim_step_time.png ADDED
work/nvme/bbjs/sbharadwaj/espnet/egs2/audioverse/v1/exp/nsynth_pitch/cls_earlarge2/images/train_time.png ADDED