Bartelds commited on
Commit
5a26043
·
1 Parent(s): 1cd99e7

Upload checkpoint, sanitized config, and transcripts for ctc-dro_xlsr_set_4

Browse files
Files changed (5) hide show
  1. README.md +41 -0
  2. config.yaml +356 -0
  3. hyp.trn +0 -0
  4. ref.trn +0 -0
  5. valid.loss.best.pth +3 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "CTC-DRO XLSR-based ASR model - set 4"
3
+ language: multilingual
4
+ tags:
5
+ - asr
6
+ - ctc-dro
7
+ - XLSR
8
+ license: cc-by-nc-4.0
9
+ ---
10
+
11
+ # CTC-DRO XLSR-based ASR model - set 4
12
+
13
+ This repository contains a CTC-DRO XLSR-based automatic speech recognition (ASR) model trained with ESPnet.
14
+ The model was trained on balanced training data from set 4.
15
+
16
+ ## Intended Use
17
+
18
+ This model is intended for ASR. Users can run inference using the provided checkpoint (`valid.loss.best.pth`) and configuration file (`config.yaml`):
19
+ ```bash
20
+ import soundfile as sf
21
+ from espnet2.bin.asr_inference import Speech2Text
22
+
23
+ asr_train_config = "ctc-dro_xlsr_set_4/config.yaml"
24
+ asr_model_file = "ctc-dro_xlsr_set_4/valid.loss.best.pth"
25
+
26
+ model = Speech2Text.from_pretrained(
27
+ asr_train_config=asr_train_config,
28
+ asr_model_file=asr_model_file
29
+ )
30
+
31
+ speech, _ = sf.read("input.wav")
32
+ text, *_ = model(speech)[0]
33
+
34
+ print("Recognized text:", text)
35
+ ```
36
+
37
+ ## How to Use
38
+
39
+ 1. Clone this repository.
40
+ 2. Use ESPnet’s inference scripts with the provided `config.yaml` and checkpoint file.
41
+ 3. Ensure any external resources referenced in `config.yaml` are available at the indicated relative paths.
config.yaml ADDED
@@ -0,0 +1,356 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ accum_grad: 16
2
+ adapter: lora
3
+ adapter_conf: {}
4
+ allow_multi_rates: false
5
+ allow_variable_data_keys: false
6
+ aux_ctc_tasks: []
7
+ batch_bins: 1000000
8
+ batch_size: 4
9
+ batch_type: duration_language
10
+ best_model_criterion:
11
+ - - valid
12
+ - loss
13
+ - min
14
+ bpemodel: null
15
+ chunk_default_fs: null
16
+ chunk_excluded_key_prefixes: []
17
+ chunk_length: 500
18
+ chunk_shift_ratio: 0.5
19
+ cleaner: null
20
+ collect_stats: false
21
+ create_graph_in_tensorboard: false
22
+ ctc_conf:
23
+ accumulation: true
24
+ ctc_type: droctc
25
+ dro_group_count: 6
26
+ dro_q_epsilon: 1.0e-10
27
+ dro_step_size: 0.001
28
+ final_step_size: 0.01
29
+ init_strategy: uniform
30
+ initial_step_size: 0.001
31
+ laplace_smoothing: 0.1
32
+ max_epoch: 40
33
+ num_iters_per_epoch: 1200
34
+ running_mean_window: -1
35
+ scheduling: false
36
+ use_running_mean: false
37
+ warmup_steps: 0
38
+ cudnn_benchmark: false
39
+ cudnn_deterministic: true
40
+ cudnn_enabled: true
41
+ decoder: null
42
+ decoder_conf: {}
43
+ detect_anomaly: false
44
+ distributed: false
45
+ drop_last_iter: false
46
+ dry_run: false
47
+ duration_batch_length: -1
48
+ early_stopping_criterion:
49
+ - valid
50
+ - loss
51
+ - min
52
+ encoder: transformer
53
+ encoder_conf:
54
+ attention_dropout_rate: 0.1
55
+ attention_heads: 8
56
+ dropout_rate: 0.1
57
+ input_layer: conv2d2
58
+ linear_units: 1024
59
+ normalize_before: true
60
+ num_blocks: 2
61
+ output_size: 256
62
+ positional_dropout_rate: 0.1
63
+ exclude_weight_decay: false
64
+ exclude_weight_decay_conf: {}
65
+ fold_length:
66
+ - 80000
67
+ - 150
68
+ freeze_param: []
69
+ frontend: s3prl
70
+ frontend_conf:
71
+ download_dir: ./hub
72
+ frontend_conf:
73
+ upstream: xls_r_300m
74
+ fs: 16k
75
+ multilayer_feature: true
76
+ g2p: null
77
+ grad_clip: 5.0
78
+ grad_clip_type: 2.0
79
+ grad_noise: false
80
+ ignore_init_mismatch: false
81
+ init: xavier_uniform
82
+ init_param: []
83
+ input_size: null
84
+ iterator_type: sequence
85
+ joint_net_conf: null
86
+ keep_nbest_models: 3
87
+ log_interval: null
88
+ log_level: INFO
89
+ max_cache_fd: 32
90
+ max_cache_size: 0.0
91
+ max_epoch: 40
92
+ model: espnet
93
+ model_conf:
94
+ ctc_weight: 1.0
95
+ multiple_iterator: false
96
+ multiprocessing_distributed: false
97
+ nbest_averaging_interval: 0
98
+ ngpu: 1
99
+ no_forward_run: false
100
+ noise_apply_prob: 1.0
101
+ noise_db_range: '13_15'
102
+ noise_scp: null
103
+ non_linguistic_symbols: ./nlsyms.txt
104
+ normalize: utterance_mvn
105
+ normalize_conf: {}
106
+ num_att_plot: 3
107
+ num_cache_chunks: 1024
108
+ num_iters_per_epoch: 1200
109
+ num_workers: 4
110
+ optim: adam
111
+ optim_conf:
112
+ lr: 0.0001
113
+ weight_decay: 1.0e-06
114
+ output_dir: ./inference_results
115
+ patience: null
116
+ postencoder: null
117
+ postencoder_conf: {}
118
+ preencoder: linear
119
+ preencoder_conf:
120
+ input_size: 1024
121
+ output_size: 80
122
+ preprocessor: default
123
+ preprocessor_conf: {}
124
+ pretrain_path: null
125
+ print_config: false
126
+ required:
127
+ - output_dir
128
+ - token_list
129
+ resume: true
130
+ rir_apply_prob: 1.0
131
+ rir_scp: null
132
+ save_strategy: all
133
+ scheduler: null
134
+ scheduler_conf: {}
135
+ seed: 0
136
+ sharded_ddp: false
137
+ short_noise_thres: 0.5
138
+ shuffle_within_batch: false
139
+ sort_batch: descending
140
+ sort_in_batch: descending
141
+ specaug: specaug
142
+ specaug_conf:
143
+ apply_freq_mask: true
144
+ apply_time_mask: true
145
+ apply_time_warp: true
146
+ freq_mask_width_range:
147
+ - 0
148
+ - 27
149
+ num_freq_mask: 2
150
+ num_time_mask: 10
151
+ time_mask_width_ratio_range:
152
+ - 0.0
153
+ - 0.05
154
+ time_warp_mode: bicubic
155
+ time_warp_window: 5
156
+ speech_volume_normalize: null
157
+ token_list:
158
+ - <blank>
159
+ - <unk>
160
+ - <space>
161
+ - E
162
+ - A
163
+ - O
164
+ - N
165
+ - S
166
+ - I
167
+ - ا
168
+ - L
169
+ - T
170
+ - R
171
+ - و
172
+ - D
173
+ - ن
174
+ - ر
175
+ - ی
176
+ - ي
177
+ - M
178
+ - U
179
+ - H
180
+ - P
181
+ - ک
182
+ - م
183
+ - C
184
+ - А
185
+ - Ӹ
186
+ - Н
187
+ - B
188
+ - ت
189
+ - س
190
+ - ل
191
+ - J
192
+ - K
193
+ - ہ
194
+ - Т
195
+ - ے
196
+ - G
197
+ - Ш
198
+ - К
199
+ - Е
200
+ - Л
201
+ - Ы
202
+ - V
203
+ - М
204
+ - ج
205
+ - Ӓ
206
+ - ه
207
+ - ب
208
+ - د
209
+ - О
210
+ - Y
211
+ - '[slv]'
212
+ - Р
213
+ - ڪ
214
+ - پ
215
+ - Z
216
+ - '[mrj]'
217
+ - F
218
+ - گ
219
+ - И
220
+ - В
221
+ - ئ
222
+ - Д
223
+ - '[sot]'
224
+ - ں
225
+ - '[spa]'
226
+ - W
227
+ - Q
228
+ - П
229
+ - Г
230
+ - ف
231
+ - ق
232
+ - С
233
+ - ع
234
+ - ش
235
+ - Ж
236
+ - ز
237
+ - ھ
238
+ - آ
239
+ - Č
240
+ - Í
241
+ - У
242
+ - ح
243
+ - '[urd]'
244
+ - Š
245
+ - ٹ
246
+ - چ
247
+ - Ь
248
+ - ٽ
249
+ - '[snd]'
250
+ - ڻ
251
+ - Й
252
+ - ط
253
+ - ص
254
+ - ٿ
255
+ - Ц
256
+ - خ
257
+ - Ó
258
+ - Я
259
+ - Á
260
+ - É
261
+ - Ч
262
+ - ۾
263
+ - '0'
264
+ - Ž
265
+ - З
266
+ - '1'
267
+ - ۽
268
+ - –
269
+ - ڏ
270
+ - Э
271
+ - ڊ
272
+ - —
273
+ - ڈ
274
+ - ء
275
+ - Ñ
276
+ - ڙ
277
+ - ِ
278
+ - '2'
279
+ - ٻ
280
+ - Х
281
+ - Ӱ
282
+ - ظ
283
+ - ض
284
+ - ث
285
+ - ڳ
286
+ - ،
287
+ - X
288
+ - ¡
289
+ - غ
290
+ - ڑ
291
+ - Ӧ
292
+ - ذ
293
+ - ¿
294
+ - '5'
295
+ - ڌ
296
+ - '3'
297
+ - ڀ
298
+ - ُ
299
+ - '9'
300
+ - Ú
301
+ - '4'
302
+ - '8'
303
+ - ۔
304
+ - '6'
305
+ - ٺ
306
+ - Ю
307
+ - »
308
+ - Б
309
+ - «
310
+ - ڇ
311
+ - ً
312
+ - ڃ
313
+ - '7'
314
+ - ڄ
315
+ - ؤ
316
+ - ڍ
317
+ - Ф
318
+ - َ
319
+ - ٰ
320
+ - ّ
321
+ - ڱ
322
+ - ”
323
+ - ژ
324
+ - ڦ
325
+ - Ё
326
+ - ؛
327
+ - ٍ
328
+ - Щ
329
+ - ؟
330
+ - ’
331
+ - ‘
332
+ - °
333
+ - ۃ
334
+ - إ
335
+ - Ć
336
+ - <sos/eos>
337
+ token_type: char
338
+ train_dtype: float32
339
+ unused_parameters: true
340
+ use_adapter: false
341
+ use_amp: false
342
+ use_lang_prompt: false
343
+ use_matplotlib: true
344
+ use_nlp_prompt: false
345
+ use_preprocessor: true
346
+ use_tensorboard: true
347
+ val_scheduler_criterion:
348
+ - valid
349
+ - loss
350
+ valid_batch_bins: null
351
+ valid_batch_size: null
352
+ valid_batch_type: null
353
+ valid_iterator_type: null
354
+ valid_max_cache_size: null
355
+ version: '202402'
356
+ write_collected_feats: false
hyp.trn ADDED
The diff for this file is too large to render. See raw diff
 
ref.trn ADDED
The diff for this file is too large to render. See raw diff
 
valid.loss.best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93e34f590cf9674caeb114dac68266eb180a31b3ac56c84fc00af0b31d294244
3
+ size 1288666400