Flo976 commited on
Commit
52fac0e
·
verified ·
1 Parent(s): 85bf11d

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - mg
4
+ license: agpl-3.0
5
+ library_name: transformers
6
+ tags:
7
+ - whisper
8
+ - automatic-speech-recognition
9
+ - speech
10
+ - malagasy
11
+ - low-resource
12
+ - fine-tuned
13
+ datasets:
14
+ - badrex/malagasy-speech-full
15
+ metrics:
16
+ - wer
17
+ pipeline_tag: automatic-speech-recognition
18
+ base_model: openai/whisper-medium
19
+ model-index:
20
+ - name: whisper-malagasy-medium
21
+ results:
22
+ - task:
23
+ type: automatic-speech-recognition
24
+ name: Speech Recognition
25
+ dataset:
26
+ name: Malagasy Speech Full
27
+ type: badrex/malagasy-speech-full
28
+ split: validation
29
+ metrics:
30
+ - type: wer
31
+ value: 20.78
32
+ name: WER
33
+ ---
34
+
35
+ <p align="center">
36
+ <img src="https://img.shields.io/badge/lang-malagasy-green?style=for-the-badge" alt="Malagasy"/>
37
+ <img src="https://img.shields.io/badge/WER-20.8%25-blue?style=for-the-badge" alt="WER"/>
38
+ <img src="https://img.shields.io/badge/base-whisper--medium-orange?style=for-the-badge" alt="Base model"/>
39
+ </p>
40
+
41
+ # Whisper Medium — Malagasy (mg)
42
+
43
+ **Le premier modele Whisper fine-tune pour la transcription du malagasy.**
44
+
45
+ Fine-tuning de [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) (769M params) sur 149h d'audio malagasy. Developpe dans le cadre du projet [Milo Voice](https://github.com/Flo976/milo), le premier assistant vocal IA en malagasy.
46
+
47
+ ## Resultats
48
+
49
+ | Metrique | Validation | Baseline (sans fine-tuning) |
50
+ |----------|-----------|---------------------------|
51
+ | **WER** | **20.78%** | >80% |
52
+
53
+ > Pour une langue low-resource comme le malagasy, un WER de ~21% est un excellent resultat.
54
+ > A titre de comparaison, Whisper medium atteint ~15% WER sur l'anglais.
55
+
56
+ ### Courbe d'entrainement
57
+
58
+ | Step | WER | Loss |
59
+ |------|-----|------|
60
+ | 1000 | 25.13% | 0.303 |
61
+ | 2000 | 22.21% | 0.261 |
62
+ | 3000 | 21.13% | 0.247 |
63
+ | 4000 | 20.97% | 0.252 |
64
+ | **5000** | **20.78%** | **0.247** |
65
+ | 6000 | 21.21% | 0.266 |
66
+ | 7000 | 21.10% | 0.270 |
67
+
68
+ Le meilleur checkpoint est a l'etape 5000 (epoch ~2.8).
69
+
70
+ ## Utilisation
71
+
72
+ ### Avec Transformers
73
+
74
+ ```python
75
+ import torch
76
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
77
+
78
+ # Le processor vient du modele de base (le checkpoint ne sauvegarde pas le vocab)
79
+ processor = WhisperProcessor.from_pretrained("openai/whisper-medium")
80
+ model = WhisperForConditionalGeneration.from_pretrained(
81
+ "Flo976/whisper-malagasy-medium",
82
+ torch_dtype=torch.float16,
83
+ ).to("cuda")
84
+
85
+ # Transcrire
86
+ inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
87
+ input_features = inputs.input_features.to("cuda", dtype=torch.float16)
88
+
89
+ with torch.no_grad():
90
+ predicted_ids = model.generate(
91
+ input_features,
92
+ language="mg",
93
+ task="transcribe",
94
+ max_new_tokens=128,
95
+ )
96
+
97
+ text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
98
+ print(text)
99
+ ```
100
+
101
+ ### Avec pipeline
102
+
103
+ ```python
104
+ from transformers import pipeline
105
+
106
+ pipe = pipeline(
107
+ "automatic-speech-recognition",
108
+ model="Flo976/whisper-malagasy-medium",
109
+ tokenizer="openai/whisper-medium",
110
+ feature_extractor="openai/whisper-medium",
111
+ device="cuda:0",
112
+ torch_dtype="float16",
113
+ )
114
+
115
+ result = pipe(
116
+ "audio.wav",
117
+ generate_kwargs={"language": "mg", "task": "transcribe"},
118
+ )
119
+ print(result["text"])
120
+ ```
121
+
122
+ ### Inference optimisee (FP16)
123
+
124
+ Le modele tourne en FP16 sur GPU, ce qui reduit la VRAM de ~3 Go a ~1.5 Go sans perte de qualite mesurable.
125
+
126
+ ## Entrainement
127
+
128
+ ### Dataset
129
+
130
+ | | Samples | Duree | Source |
131
+ |--|---------|-------|--------|
132
+ | **Train** | 28 371 | 149h | [badrex/malagasy-speech-full](https://huggingface.co/datasets/badrex/malagasy-speech-full) |
133
+ | **Validation** | 3 099 | - | idem |
134
+ | **Test** | 3 101 | - | idem |
135
+
136
+ - 56 locuteurs uniques
137
+ - Audio 16kHz mono
138
+ - Transcriptions en malagasy (dialecte plateau / officiel)
139
+
140
+ ### Hyperparametres
141
+
142
+ | Parametre | Valeur |
143
+ |-----------|--------|
144
+ | Modele de base | `openai/whisper-medium` (769M params) |
145
+ | Epochs | 4 (~3.95) |
146
+ | Batch size | 8 x 2 (gradient accumulation) = 16 effectif |
147
+ | Learning rate | 1e-5 |
148
+ | Warmup steps | 500 |
149
+ | Optimizer | AdamW |
150
+ | Precision | FP16 |
151
+ | Gradient checkpointing | Oui |
152
+ | Best checkpoint | Step 5000 (WER 20.78%) |
153
+
154
+ ### Materiel
155
+
156
+ | | |
157
+ |-|-|
158
+ | GPU | NVIDIA RTX 5070 Ti (16 Go VRAM) |
159
+ | Temps d'entrainement | ~12h |
160
+ | Framework | HuggingFace Transformers 4.47 + PyTorch 2.5 |
161
+
162
+ ### Script d'entrainement
163
+
164
+ ```bash
165
+ python scripts/03_train.py \
166
+ --model openai/whisper-medium \
167
+ --dataset badrex/malagasy-speech-full \
168
+ --output-dir models/whisper-mg-v1 \
169
+ --epochs 10 \
170
+ --batch-size 8 \
171
+ --grad-accum 2 \
172
+ --lr 1e-5 \
173
+ --warmup-steps 500 \
174
+ --eval-steps 500
175
+ ```
176
+
177
+ ## Limitations
178
+
179
+ - **Dialectes** : entraine principalement sur le malagasy officiel (dialecte plateau / Merina). Les performances sur les dialectes cotiers (Betsimisaraka, Sakalava, etc.) ne sont pas evaluees.
180
+ - **Bruit** : les performances se degradent avec du bruit de fond important. Le dataset d'entrainement contient majoritairement de l'audio propre.
181
+ - **Phrases longues** : le modele est optimise pour des phrases courtes a moyennes (<30s). Les transcriptions longues peuvent contenir des hallucinations.
182
+ - **Processor** : le checkpoint ne contient pas le tokenizer/processor — utiliser `openai/whisper-medium` pour le processor.
183
+
184
+ ## Cas d'usage
185
+
186
+ - Assistant vocal malagasy ([Milo Voice](https://github.com/Flo976/milo))
187
+ - Transcription de reunions / interviews en malagasy
188
+ - Sous-titrage automatique de videos malagasy
189
+ - Accessibilite pour les locuteurs malagasy
190
+
191
+ ## Citation
192
+
193
+ ```bibtex
194
+ @misc{whisper-malagasy-medium-2026,
195
+ author = {Florent Didelot},
196
+ title = {Whisper Medium fine-tuned for Malagasy Speech Recognition},
197
+ year = {2026},
198
+ publisher = {HuggingFace},
199
+ url = {https://huggingface.co/Flo976/whisper-malagasy-medium}
200
+ }
201
+ ```
202
+
203
+ ## Licence
204
+
205
+ AGPL-3.0 — voir [LICENSE](https://github.com/Flo976/milo/blob/main/LICENSE)
206
+
207
+ ---
208
+
209
+ Developpe dans le cadre du projet **[Milo Voice](https://github.com/Flo976/milo)** par [Sooatek](https://sooatek.com).
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.0,
3
+ "activation_function": "gelu",
4
+ "apply_spec_augment": false,
5
+ "architectures": [
6
+ "WhisperForConditionalGeneration"
7
+ ],
8
+ "attention_dropout": 0.0,
9
+ "bos_token_id": 50257,
10
+ "classifier_proj_size": 256,
11
+ "d_model": 1024,
12
+ "decoder_attention_heads": 16,
13
+ "decoder_ffn_dim": 4096,
14
+ "decoder_layerdrop": 0.0,
15
+ "decoder_layers": 24,
16
+ "decoder_start_token_id": 50258,
17
+ "dropout": 0.0,
18
+ "dtype": "float32",
19
+ "encoder_attention_heads": 16,
20
+ "encoder_ffn_dim": 4096,
21
+ "encoder_layerdrop": 0.0,
22
+ "encoder_layers": 24,
23
+ "eos_token_id": 50257,
24
+ "forced_decoder_ids": null,
25
+ "init_std": 0.02,
26
+ "is_encoder_decoder": true,
27
+ "mask_feature_length": 10,
28
+ "mask_feature_min_masks": 0,
29
+ "mask_feature_prob": 0.0,
30
+ "mask_time_length": 10,
31
+ "mask_time_min_masks": 2,
32
+ "mask_time_prob": 0.05,
33
+ "max_source_positions": 1500,
34
+ "max_target_positions": 448,
35
+ "median_filter_width": 7,
36
+ "model_type": "whisper",
37
+ "num_hidden_layers": 24,
38
+ "num_mel_bins": 80,
39
+ "pad_token_id": 50257,
40
+ "scale_embedding": false,
41
+ "suppress_tokens": null,
42
+ "tie_word_embeddings": true,
43
+ "transformers_version": "5.1.0",
44
+ "use_cache": false,
45
+ "use_weighted_layer_sum": false,
46
+ "vocab_size": 51865
47
+ }
generation_config.json ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 13,
5
+ 15
6
+ ],
7
+ [
8
+ 15,
9
+ 4
10
+ ],
11
+ [
12
+ 15,
13
+ 15
14
+ ],
15
+ [
16
+ 16,
17
+ 1
18
+ ],
19
+ [
20
+ 20,
21
+ 0
22
+ ],
23
+ [
24
+ 23,
25
+ 4
26
+ ]
27
+ ],
28
+ "assistant_confidence_threshold": 0.4,
29
+ "assistant_lookbehind": 10,
30
+ "begin_suppress_tokens": [
31
+ 220,
32
+ 50257
33
+ ],
34
+ "bos_token_id": 50257,
35
+ "decoder_start_token_id": 50258,
36
+ "diversity_penalty": 0.0,
37
+ "do_sample": false,
38
+ "early_stopping": false,
39
+ "encoder_no_repeat_ngram_size": 0,
40
+ "encoder_repetition_penalty": 1.0,
41
+ "eos_token_id": 50257,
42
+ "epsilon_cutoff": 0.0,
43
+ "eta_cutoff": 0.0,
44
+ "forced_decoder_ids": [
45
+ [
46
+ 1,
47
+ 50349
48
+ ],
49
+ [
50
+ 2,
51
+ 50359
52
+ ],
53
+ [
54
+ 3,
55
+ 50363
56
+ ]
57
+ ],
58
+ "is_multilingual": true,
59
+ "lang_to_id": {
60
+ "<|af|>": 50327,
61
+ "<|am|>": 50334,
62
+ "<|ar|>": 50272,
63
+ "<|as|>": 50350,
64
+ "<|az|>": 50304,
65
+ "<|ba|>": 50355,
66
+ "<|be|>": 50330,
67
+ "<|bg|>": 50292,
68
+ "<|bn|>": 50302,
69
+ "<|bo|>": 50347,
70
+ "<|br|>": 50309,
71
+ "<|bs|>": 50315,
72
+ "<|ca|>": 50270,
73
+ "<|cs|>": 50283,
74
+ "<|cy|>": 50297,
75
+ "<|da|>": 50285,
76
+ "<|de|>": 50261,
77
+ "<|el|>": 50281,
78
+ "<|en|>": 50259,
79
+ "<|es|>": 50262,
80
+ "<|et|>": 50307,
81
+ "<|eu|>": 50310,
82
+ "<|fa|>": 50300,
83
+ "<|fi|>": 50277,
84
+ "<|fo|>": 50338,
85
+ "<|fr|>": 50265,
86
+ "<|gl|>": 50319,
87
+ "<|gu|>": 50333,
88
+ "<|haw|>": 50352,
89
+ "<|ha|>": 50354,
90
+ "<|he|>": 50279,
91
+ "<|hi|>": 50276,
92
+ "<|hr|>": 50291,
93
+ "<|ht|>": 50339,
94
+ "<|hu|>": 50286,
95
+ "<|hy|>": 50312,
96
+ "<|id|>": 50275,
97
+ "<|is|>": 50311,
98
+ "<|it|>": 50274,
99
+ "<|ja|>": 50266,
100
+ "<|jw|>": 50356,
101
+ "<|ka|>": 50329,
102
+ "<|kk|>": 50316,
103
+ "<|km|>": 50323,
104
+ "<|kn|>": 50306,
105
+ "<|ko|>": 50264,
106
+ "<|la|>": 50294,
107
+ "<|lb|>": 50345,
108
+ "<|ln|>": 50353,
109
+ "<|lo|>": 50336,
110
+ "<|lt|>": 50293,
111
+ "<|lv|>": 50301,
112
+ "<|mg|>": 50349,
113
+ "<|mi|>": 50295,
114
+ "<|mk|>": 50308,
115
+ "<|ml|>": 50296,
116
+ "<|mn|>": 50314,
117
+ "<|mr|>": 50320,
118
+ "<|ms|>": 50282,
119
+ "<|mt|>": 50343,
120
+ "<|my|>": 50346,
121
+ "<|ne|>": 50313,
122
+ "<|nl|>": 50271,
123
+ "<|nn|>": 50342,
124
+ "<|no|>": 50288,
125
+ "<|oc|>": 50328,
126
+ "<|pa|>": 50321,
127
+ "<|pl|>": 50269,
128
+ "<|ps|>": 50340,
129
+ "<|pt|>": 50267,
130
+ "<|ro|>": 50284,
131
+ "<|ru|>": 50263,
132
+ "<|sa|>": 50344,
133
+ "<|sd|>": 50332,
134
+ "<|si|>": 50322,
135
+ "<|sk|>": 50298,
136
+ "<|sl|>": 50305,
137
+ "<|sn|>": 50324,
138
+ "<|so|>": 50326,
139
+ "<|sq|>": 50317,
140
+ "<|sr|>": 50303,
141
+ "<|su|>": 50357,
142
+ "<|sv|>": 50273,
143
+ "<|sw|>": 50318,
144
+ "<|ta|>": 50287,
145
+ "<|te|>": 50299,
146
+ "<|tg|>": 50331,
147
+ "<|th|>": 50289,
148
+ "<|tk|>": 50341,
149
+ "<|tl|>": 50348,
150
+ "<|tr|>": 50268,
151
+ "<|tt|>": 50351,
152
+ "<|uk|>": 50280,
153
+ "<|ur|>": 50290,
154
+ "<|uz|>": 50337,
155
+ "<|vi|>": 50278,
156
+ "<|yi|>": 50335,
157
+ "<|yo|>": 50325,
158
+ "<|zh|>": 50260
159
+ },
160
+ "language": "mg",
161
+ "length_penalty": 1.0,
162
+ "max_initial_timestamp_index": 50,
163
+ "max_length": 448,
164
+ "min_length": 0,
165
+ "no_repeat_ngram_size": 0,
166
+ "no_timestamps_token_id": 50363,
167
+ "num_assistant_tokens": 20,
168
+ "num_assistant_tokens_schedule": "constant",
169
+ "num_beam_groups": 1,
170
+ "num_beams": 1,
171
+ "num_return_sequences": 1,
172
+ "output_scores": false,
173
+ "pad_token_id": 50257,
174
+ "prev_sot_token_id": 50361,
175
+ "remove_invalid_values": false,
176
+ "repetition_penalty": 1.0,
177
+ "return_dict_in_generate": false,
178
+ "return_timestamps": false,
179
+ "suppress_tokens": [],
180
+ "target_lookbehind": 10,
181
+ "task": "transcribe",
182
+ "task_to_id": {
183
+ "transcribe": 50359,
184
+ "translate": 50358
185
+ },
186
+ "temperature": 1.0,
187
+ "top_k": 50,
188
+ "top_p": 1.0,
189
+ "transformers_version": "5.1.0",
190
+ "typical_p": 1.0,
191
+ "use_cache": true
192
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c496a22794dcb0b1b9cac0ef2ed704a53edff531e76a197111878c0306a5fa2
3
+ size 3055544304
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d7e195385c0a2abcfa85b2acbd5be48bfc2bd018cc1ec0893c3c1e888af63ce
3
+ size 6111697804
preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "dither": 0.0,
4
+ "feature_extractor_type": "WhisperFeatureExtractor",
5
+ "feature_size": 80,
6
+ "hop_length": 160,
7
+ "n_fft": 400,
8
+ "n_samples": 480000,
9
+ "nb_max_frames": 3000,
10
+ "padding_side": "right",
11
+ "padding_value": 0.0,
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8a21d2146258768676a0824c3cf74e6720e1d3471c72d2c20112ccf6b0152d4
3
+ size 14645
scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75a1feffb2d659ca07ed7b8c54fb94c44340b4eee70e677aa53de072e57b87a7
3
+ size 1383
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d14016e0fdde7e7cab5172ed034987eb7b0109d723c15cb1fd369ee8558dcc3a
3
+ size 1465
trainer_state.json ADDED
@@ -0,0 +1,2057 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 5000,
3
+ "best_metric": 0.20783165635834974,
4
+ "best_model_checkpoint": "/home/florent/milo/models/whisper-mg-v1/checkpoint-5000",
5
+ "epoch": 3.946285069787114,
6
+ "eval_steps": 1000,
7
+ "global_step": 7000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.014098406880022557,
14
+ "grad_norm": 84.86420440673828,
15
+ "learning_rate": 4.800000000000001e-07,
16
+ "loss": 17.227703857421876,
17
+ "step": 25
18
+ },
19
+ {
20
+ "epoch": 0.028196813760045115,
21
+ "grad_norm": 66.53484344482422,
22
+ "learning_rate": 9.800000000000001e-07,
23
+ "loss": 13.832662353515625,
24
+ "step": 50
25
+ },
26
+ {
27
+ "epoch": 0.042295220640067674,
28
+ "grad_norm": 43.68435287475586,
29
+ "learning_rate": 1.48e-06,
30
+ "loss": 10.33947021484375,
31
+ "step": 75
32
+ },
33
+ {
34
+ "epoch": 0.05639362752009023,
35
+ "grad_norm": 34.214447021484375,
36
+ "learning_rate": 1.98e-06,
37
+ "loss": 7.374788818359375,
38
+ "step": 100
39
+ },
40
+ {
41
+ "epoch": 0.07049203440011279,
42
+ "grad_norm": 27.010826110839844,
43
+ "learning_rate": 2.4800000000000004e-06,
44
+ "loss": 5.627406005859375,
45
+ "step": 125
46
+ },
47
+ {
48
+ "epoch": 0.08459044128013535,
49
+ "grad_norm": 24.344482421875,
50
+ "learning_rate": 2.9800000000000003e-06,
51
+ "loss": 4.657888793945313,
52
+ "step": 150
53
+ },
54
+ {
55
+ "epoch": 0.0986888481601579,
56
+ "grad_norm": 24.166223526000977,
57
+ "learning_rate": 3.48e-06,
58
+ "loss": 3.687208251953125,
59
+ "step": 175
60
+ },
61
+ {
62
+ "epoch": 0.11278725504018046,
63
+ "grad_norm": 23.202571868896484,
64
+ "learning_rate": 3.980000000000001e-06,
65
+ "loss": 2.8217697143554688,
66
+ "step": 200
67
+ },
68
+ {
69
+ "epoch": 0.126885661920203,
70
+ "grad_norm": 21.142698287963867,
71
+ "learning_rate": 4.48e-06,
72
+ "loss": 2.384236602783203,
73
+ "step": 225
74
+ },
75
+ {
76
+ "epoch": 0.14098406880022557,
77
+ "grad_norm": 21.08614730834961,
78
+ "learning_rate": 4.980000000000001e-06,
79
+ "loss": 2.1562757873535157,
80
+ "step": 250
81
+ },
82
+ {
83
+ "epoch": 0.15508247568024813,
84
+ "grad_norm": 20.73036003112793,
85
+ "learning_rate": 5.480000000000001e-06,
86
+ "loss": 2.0236280822753905,
87
+ "step": 275
88
+ },
89
+ {
90
+ "epoch": 0.1691808825602707,
91
+ "grad_norm": 21.132606506347656,
92
+ "learning_rate": 5.98e-06,
93
+ "loss": 1.865457763671875,
94
+ "step": 300
95
+ },
96
+ {
97
+ "epoch": 0.18327928944029326,
98
+ "grad_norm": 20.846128463745117,
99
+ "learning_rate": 6.480000000000001e-06,
100
+ "loss": 1.7576261901855468,
101
+ "step": 325
102
+ },
103
+ {
104
+ "epoch": 0.1973776963203158,
105
+ "grad_norm": 18.715116500854492,
106
+ "learning_rate": 6.98e-06,
107
+ "loss": 1.6428683471679688,
108
+ "step": 350
109
+ },
110
+ {
111
+ "epoch": 0.21147610320033836,
112
+ "grad_norm": 18.342042922973633,
113
+ "learning_rate": 7.48e-06,
114
+ "loss": 1.6423768615722656,
115
+ "step": 375
116
+ },
117
+ {
118
+ "epoch": 0.22557451008036092,
119
+ "grad_norm": 19.184682846069336,
120
+ "learning_rate": 7.980000000000002e-06,
121
+ "loss": 1.6796397399902343,
122
+ "step": 400
123
+ },
124
+ {
125
+ "epoch": 0.23967291696038348,
126
+ "grad_norm": 15.670419692993164,
127
+ "learning_rate": 8.48e-06,
128
+ "loss": 1.5983003234863282,
129
+ "step": 425
130
+ },
131
+ {
132
+ "epoch": 0.253771323840406,
133
+ "grad_norm": 17.033227920532227,
134
+ "learning_rate": 8.98e-06,
135
+ "loss": 1.5592161560058593,
136
+ "step": 450
137
+ },
138
+ {
139
+ "epoch": 0.2678697307204286,
140
+ "grad_norm": 17.4881649017334,
141
+ "learning_rate": 9.48e-06,
142
+ "loss": 1.5372105407714844,
143
+ "step": 475
144
+ },
145
+ {
146
+ "epoch": 0.28196813760045114,
147
+ "grad_norm": 14.954177856445312,
148
+ "learning_rate": 9.980000000000001e-06,
149
+ "loss": 1.3610572814941406,
150
+ "step": 500
151
+ },
152
+ {
153
+ "epoch": 0.2960665444804737,
154
+ "grad_norm": 14.768590927124023,
155
+ "learning_rate": 9.986078886310906e-06,
156
+ "loss": 1.4396812438964843,
157
+ "step": 525
158
+ },
159
+ {
160
+ "epoch": 0.31016495136049627,
161
+ "grad_norm": 14.442098617553711,
162
+ "learning_rate": 9.971577726218099e-06,
163
+ "loss": 1.3178257751464844,
164
+ "step": 550
165
+ },
166
+ {
167
+ "epoch": 0.32426335824051883,
168
+ "grad_norm": 15.411622047424316,
169
+ "learning_rate": 9.957076566125291e-06,
170
+ "loss": 1.4055836486816407,
171
+ "step": 575
172
+ },
173
+ {
174
+ "epoch": 0.3383617651205414,
175
+ "grad_norm": 15.339015007019043,
176
+ "learning_rate": 9.942575406032482e-06,
177
+ "loss": 1.263025665283203,
178
+ "step": 600
179
+ },
180
+ {
181
+ "epoch": 0.35246017200056395,
182
+ "grad_norm": 12.730027198791504,
183
+ "learning_rate": 9.928074245939677e-06,
184
+ "loss": 1.3695268249511718,
185
+ "step": 625
186
+ },
187
+ {
188
+ "epoch": 0.3665585788805865,
189
+ "grad_norm": 15.123625755310059,
190
+ "learning_rate": 9.913573085846868e-06,
191
+ "loss": 1.2718246459960938,
192
+ "step": 650
193
+ },
194
+ {
195
+ "epoch": 0.380656985760609,
196
+ "grad_norm": 15.623430252075195,
197
+ "learning_rate": 9.899071925754062e-06,
198
+ "loss": 1.2624960327148438,
199
+ "step": 675
200
+ },
201
+ {
202
+ "epoch": 0.3947553926406316,
203
+ "grad_norm": 14.083202362060547,
204
+ "learning_rate": 9.884570765661253e-06,
205
+ "loss": 1.312706756591797,
206
+ "step": 700
207
+ },
208
+ {
209
+ "epoch": 0.40885379952065415,
210
+ "grad_norm": 14.791698455810547,
211
+ "learning_rate": 9.870069605568446e-06,
212
+ "loss": 1.1517729187011718,
213
+ "step": 725
214
+ },
215
+ {
216
+ "epoch": 0.4229522064006767,
217
+ "grad_norm": 16.083740234375,
218
+ "learning_rate": 9.855568445475639e-06,
219
+ "loss": 1.3111384582519532,
220
+ "step": 750
221
+ },
222
+ {
223
+ "epoch": 0.4370506132806993,
224
+ "grad_norm": 12.33295726776123,
225
+ "learning_rate": 9.841067285382831e-06,
226
+ "loss": 1.1583899688720702,
227
+ "step": 775
228
+ },
229
+ {
230
+ "epoch": 0.45114902016072184,
231
+ "grad_norm": 15.303627014160156,
232
+ "learning_rate": 9.826566125290024e-06,
233
+ "loss": 1.1577481079101561,
234
+ "step": 800
235
+ },
236
+ {
237
+ "epoch": 0.4652474270407444,
238
+ "grad_norm": 14.345124244689941,
239
+ "learning_rate": 9.812064965197217e-06,
240
+ "loss": 1.1901895904541016,
241
+ "step": 825
242
+ },
243
+ {
244
+ "epoch": 0.47934583392076696,
245
+ "grad_norm": 14.913029670715332,
246
+ "learning_rate": 9.79756380510441e-06,
247
+ "loss": 1.1890618896484375,
248
+ "step": 850
249
+ },
250
+ {
251
+ "epoch": 0.4934442408007895,
252
+ "grad_norm": 13.710859298706055,
253
+ "learning_rate": 9.783062645011602e-06,
254
+ "loss": 1.1438979339599609,
255
+ "step": 875
256
+ },
257
+ {
258
+ "epoch": 0.507542647680812,
259
+ "grad_norm": 13.514805793762207,
260
+ "learning_rate": 9.768561484918795e-06,
261
+ "loss": 1.1654217529296875,
262
+ "step": 900
263
+ },
264
+ {
265
+ "epoch": 0.5216410545608346,
266
+ "grad_norm": 12.146404266357422,
267
+ "learning_rate": 9.754060324825988e-06,
268
+ "loss": 1.1279725646972656,
269
+ "step": 925
270
+ },
271
+ {
272
+ "epoch": 0.5357394614408572,
273
+ "grad_norm": 14.747977256774902,
274
+ "learning_rate": 9.73955916473318e-06,
275
+ "loss": 1.0921778106689453,
276
+ "step": 950
277
+ },
278
+ {
279
+ "epoch": 0.5498378683208798,
280
+ "grad_norm": 12.012781143188477,
281
+ "learning_rate": 9.725058004640371e-06,
282
+ "loss": 1.089789581298828,
283
+ "step": 975
284
+ },
285
+ {
286
+ "epoch": 0.5639362752009023,
287
+ "grad_norm": 14.096403121948242,
288
+ "learning_rate": 9.710556844547566e-06,
289
+ "loss": 1.09536865234375,
290
+ "step": 1000
291
+ },
292
+ {
293
+ "epoch": 0.5639362752009023,
294
+ "eval_loss": 0.3027215600013733,
295
+ "eval_runtime": 1414.4639,
296
+ "eval_samples_per_second": 2.191,
297
+ "eval_steps_per_second": 0.274,
298
+ "eval_wer": 0.25130644178215644,
299
+ "step": 1000
300
+ },
301
+ {
302
+ "epoch": 0.5780346820809249,
303
+ "grad_norm": 12.812657356262207,
304
+ "learning_rate": 9.696055684454757e-06,
305
+ "loss": 1.073106155395508,
306
+ "step": 1025
307
+ },
308
+ {
309
+ "epoch": 0.5921330889609474,
310
+ "grad_norm": 12.071333885192871,
311
+ "learning_rate": 9.68155452436195e-06,
312
+ "loss": 1.0373800659179688,
313
+ "step": 1050
314
+ },
315
+ {
316
+ "epoch": 0.60623149584097,
317
+ "grad_norm": 14.140727043151855,
318
+ "learning_rate": 9.667053364269142e-06,
319
+ "loss": 1.0855345916748047,
320
+ "step": 1075
321
+ },
322
+ {
323
+ "epoch": 0.6203299027209925,
324
+ "grad_norm": 12.884740829467773,
325
+ "learning_rate": 9.652552204176335e-06,
326
+ "loss": 1.0462813568115235,
327
+ "step": 1100
328
+ },
329
+ {
330
+ "epoch": 0.634428309601015,
331
+ "grad_norm": 12.38447093963623,
332
+ "learning_rate": 9.638051044083528e-06,
333
+ "loss": 1.0593311309814453,
334
+ "step": 1125
335
+ },
336
+ {
337
+ "epoch": 0.6485267164810377,
338
+ "grad_norm": 13.830496788024902,
339
+ "learning_rate": 9.62354988399072e-06,
340
+ "loss": 1.0979310607910155,
341
+ "step": 1150
342
+ },
343
+ {
344
+ "epoch": 0.6626251233610602,
345
+ "grad_norm": 14.614953994750977,
346
+ "learning_rate": 9.609048723897913e-06,
347
+ "loss": 1.1127804565429686,
348
+ "step": 1175
349
+ },
350
+ {
351
+ "epoch": 0.6767235302410828,
352
+ "grad_norm": 13.682169914245605,
353
+ "learning_rate": 9.594547563805106e-06,
354
+ "loss": 1.0035169982910157,
355
+ "step": 1200
356
+ },
357
+ {
358
+ "epoch": 0.6908219371211053,
359
+ "grad_norm": 13.420117378234863,
360
+ "learning_rate": 9.580046403712297e-06,
361
+ "loss": 1.0495610046386719,
362
+ "step": 1225
363
+ },
364
+ {
365
+ "epoch": 0.7049203440011279,
366
+ "grad_norm": 12.9420166015625,
367
+ "learning_rate": 9.565545243619491e-06,
368
+ "loss": 1.060257339477539,
369
+ "step": 1250
370
+ },
371
+ {
372
+ "epoch": 0.7190187508811504,
373
+ "grad_norm": 12.879499435424805,
374
+ "learning_rate": 9.551044083526682e-06,
375
+ "loss": 1.0332550811767578,
376
+ "step": 1275
377
+ },
378
+ {
379
+ "epoch": 0.733117157761173,
380
+ "grad_norm": 13.018166542053223,
381
+ "learning_rate": 9.536542923433877e-06,
382
+ "loss": 1.0661671447753907,
383
+ "step": 1300
384
+ },
385
+ {
386
+ "epoch": 0.7472155646411955,
387
+ "grad_norm": 13.178157806396484,
388
+ "learning_rate": 9.522041763341068e-06,
389
+ "loss": 0.9954322052001953,
390
+ "step": 1325
391
+ },
392
+ {
393
+ "epoch": 0.761313971521218,
394
+ "grad_norm": 15.271596908569336,
395
+ "learning_rate": 9.50754060324826e-06,
396
+ "loss": 1.0291824340820312,
397
+ "step": 1350
398
+ },
399
+ {
400
+ "epoch": 0.7754123784012407,
401
+ "grad_norm": 17.327726364135742,
402
+ "learning_rate": 9.493039443155453e-06,
403
+ "loss": 1.0398453521728515,
404
+ "step": 1375
405
+ },
406
+ {
407
+ "epoch": 0.7895107852812632,
408
+ "grad_norm": 13.380585670471191,
409
+ "learning_rate": 9.478538283062646e-06,
410
+ "loss": 0.9851990509033203,
411
+ "step": 1400
412
+ },
413
+ {
414
+ "epoch": 0.8036091921612858,
415
+ "grad_norm": 11.947903633117676,
416
+ "learning_rate": 9.464037122969838e-06,
417
+ "loss": 0.9933275604248046,
418
+ "step": 1425
419
+ },
420
+ {
421
+ "epoch": 0.8177075990413083,
422
+ "grad_norm": 11.313209533691406,
423
+ "learning_rate": 9.449535962877031e-06,
424
+ "loss": 0.95019775390625,
425
+ "step": 1450
426
+ },
427
+ {
428
+ "epoch": 0.8318060059213309,
429
+ "grad_norm": 9.134963035583496,
430
+ "learning_rate": 9.435034802784224e-06,
431
+ "loss": 0.9632527160644532,
432
+ "step": 1475
433
+ },
434
+ {
435
+ "epoch": 0.8459044128013534,
436
+ "grad_norm": 11.307052612304688,
437
+ "learning_rate": 9.420533642691417e-06,
438
+ "loss": 1.0088771820068358,
439
+ "step": 1500
440
+ },
441
+ {
442
+ "epoch": 0.860002819681376,
443
+ "grad_norm": 11.490585327148438,
444
+ "learning_rate": 9.406032482598608e-06,
445
+ "loss": 0.975683822631836,
446
+ "step": 1525
447
+ },
448
+ {
449
+ "epoch": 0.8741012265613985,
450
+ "grad_norm": 11.09744930267334,
451
+ "learning_rate": 9.391531322505802e-06,
452
+ "loss": 0.9516730499267578,
453
+ "step": 1550
454
+ },
455
+ {
456
+ "epoch": 0.8881996334414212,
457
+ "grad_norm": 12.852828979492188,
458
+ "learning_rate": 9.377030162412993e-06,
459
+ "loss": 0.9338680267333984,
460
+ "step": 1575
461
+ },
462
+ {
463
+ "epoch": 0.9022980403214437,
464
+ "grad_norm": 13.335673332214355,
465
+ "learning_rate": 9.362529002320186e-06,
466
+ "loss": 0.9313024139404297,
467
+ "step": 1600
468
+ },
469
+ {
470
+ "epoch": 0.9163964472014662,
471
+ "grad_norm": 11.356801986694336,
472
+ "learning_rate": 9.348027842227378e-06,
473
+ "loss": 1.0073654174804687,
474
+ "step": 1625
475
+ },
476
+ {
477
+ "epoch": 0.9304948540814888,
478
+ "grad_norm": 13.629708290100098,
479
+ "learning_rate": 9.333526682134571e-06,
480
+ "loss": 1.007170639038086,
481
+ "step": 1650
482
+ },
483
+ {
484
+ "epoch": 0.9445932609615113,
485
+ "grad_norm": 9.606148719787598,
486
+ "learning_rate": 9.319025522041764e-06,
487
+ "loss": 0.9286507415771484,
488
+ "step": 1675
489
+ },
490
+ {
491
+ "epoch": 0.9586916678415339,
492
+ "grad_norm": 9.82172679901123,
493
+ "learning_rate": 9.304524361948957e-06,
494
+ "loss": 0.9336747741699218,
495
+ "step": 1700
496
+ },
497
+ {
498
+ "epoch": 0.9727900747215564,
499
+ "grad_norm": 11.669585227966309,
500
+ "learning_rate": 9.29002320185615e-06,
501
+ "loss": 0.9786383819580078,
502
+ "step": 1725
503
+ },
504
+ {
505
+ "epoch": 0.986888481601579,
506
+ "grad_norm": 11.385014533996582,
507
+ "learning_rate": 9.275522041763342e-06,
508
+ "loss": 0.9959779357910157,
509
+ "step": 1750
510
+ },
511
+ {
512
+ "epoch": 1.0005639362752008,
513
+ "grad_norm": 11.361459732055664,
514
+ "learning_rate": 9.261020881670535e-06,
515
+ "loss": 0.9319418334960937,
516
+ "step": 1775
517
+ },
518
+ {
519
+ "epoch": 1.0146623431552235,
520
+ "grad_norm": 10.459379196166992,
521
+ "learning_rate": 9.246519721577727e-06,
522
+ "loss": 0.7039449310302734,
523
+ "step": 1800
524
+ },
525
+ {
526
+ "epoch": 1.028760750035246,
527
+ "grad_norm": 8.165692329406738,
528
+ "learning_rate": 9.23201856148492e-06,
529
+ "loss": 0.7685092926025391,
530
+ "step": 1825
531
+ },
532
+ {
533
+ "epoch": 1.0428591569152685,
534
+ "grad_norm": 9.223814010620117,
535
+ "learning_rate": 9.217517401392111e-06,
536
+ "loss": 0.7643940734863282,
537
+ "step": 1850
538
+ },
539
+ {
540
+ "epoch": 1.056957563795291,
541
+ "grad_norm": 12.819178581237793,
542
+ "learning_rate": 9.203016241299306e-06,
543
+ "loss": 0.7673802185058594,
544
+ "step": 1875
545
+ },
546
+ {
547
+ "epoch": 1.0710559706753138,
548
+ "grad_norm": 8.873549461364746,
549
+ "learning_rate": 9.188515081206497e-06,
550
+ "loss": 0.7162540435791016,
551
+ "step": 1900
552
+ },
553
+ {
554
+ "epoch": 1.0851543775553363,
555
+ "grad_norm": 11.80384349822998,
556
+ "learning_rate": 9.174013921113691e-06,
557
+ "loss": 0.7582861328125,
558
+ "step": 1925
559
+ },
560
+ {
561
+ "epoch": 1.0992527844353588,
562
+ "grad_norm": 10.77530288696289,
563
+ "learning_rate": 9.159512761020882e-06,
564
+ "loss": 0.7159561920166015,
565
+ "step": 1950
566
+ },
567
+ {
568
+ "epoch": 1.1133511913153813,
569
+ "grad_norm": 11.646292686462402,
570
+ "learning_rate": 9.145011600928075e-06,
571
+ "loss": 0.7990260314941406,
572
+ "step": 1975
573
+ },
574
+ {
575
+ "epoch": 1.1274495981954038,
576
+ "grad_norm": 11.254467010498047,
577
+ "learning_rate": 9.130510440835267e-06,
578
+ "loss": 0.7786456298828125,
579
+ "step": 2000
580
+ },
581
+ {
582
+ "epoch": 1.1274495981954038,
583
+ "eval_loss": 0.2605888545513153,
584
+ "eval_runtime": 1451.9513,
585
+ "eval_samples_per_second": 2.134,
586
+ "eval_steps_per_second": 0.267,
587
+ "eval_wer": 0.22206013129365215,
588
+ "step": 2000
589
+ },
590
+ {
591
+ "epoch": 1.1415480050754265,
592
+ "grad_norm": 8.868310928344727,
593
+ "learning_rate": 9.11600928074246e-06,
594
+ "loss": 0.7087932586669922,
595
+ "step": 2025
596
+ },
597
+ {
598
+ "epoch": 1.155646411955449,
599
+ "grad_norm": 11.000504493713379,
600
+ "learning_rate": 9.101508120649653e-06,
601
+ "loss": 0.7248979949951172,
602
+ "step": 2050
603
+ },
604
+ {
605
+ "epoch": 1.1697448188354715,
606
+ "grad_norm": 12.025858879089355,
607
+ "learning_rate": 9.087006960556846e-06,
608
+ "loss": 0.7824922943115235,
609
+ "step": 2075
610
+ },
611
+ {
612
+ "epoch": 1.1838432257154943,
613
+ "grad_norm": 12.042730331420898,
614
+ "learning_rate": 9.072505800464038e-06,
615
+ "loss": 0.8172999572753906,
616
+ "step": 2100
617
+ },
618
+ {
619
+ "epoch": 1.1979416325955168,
620
+ "grad_norm": 9.696090698242188,
621
+ "learning_rate": 9.058004640371231e-06,
622
+ "loss": 0.8125002288818359,
623
+ "step": 2125
624
+ },
625
+ {
626
+ "epoch": 1.2120400394755393,
627
+ "grad_norm": 10.16943359375,
628
+ "learning_rate": 9.043503480278422e-06,
629
+ "loss": 0.76618408203125,
630
+ "step": 2150
631
+ },
632
+ {
633
+ "epoch": 1.2261384463555618,
634
+ "grad_norm": 9.093573570251465,
635
+ "learning_rate": 9.029002320185616e-06,
636
+ "loss": 0.7104488372802734,
637
+ "step": 2175
638
+ },
639
+ {
640
+ "epoch": 1.2402368532355843,
641
+ "grad_norm": 10.546662330627441,
642
+ "learning_rate": 9.014501160092808e-06,
643
+ "loss": 0.7818730926513672,
644
+ "step": 2200
645
+ },
646
+ {
647
+ "epoch": 1.254335260115607,
648
+ "grad_norm": 12.367358207702637,
649
+ "learning_rate": 9e-06,
650
+ "loss": 0.8542975616455079,
651
+ "step": 2225
652
+ },
653
+ {
654
+ "epoch": 1.2684336669956295,
655
+ "grad_norm": 10.952892303466797,
656
+ "learning_rate": 8.985498839907193e-06,
657
+ "loss": 0.7449074554443359,
658
+ "step": 2250
659
+ },
660
+ {
661
+ "epoch": 1.282532073875652,
662
+ "grad_norm": 8.846220970153809,
663
+ "learning_rate": 8.970997679814386e-06,
664
+ "loss": 0.7692269897460937,
665
+ "step": 2275
666
+ },
667
+ {
668
+ "epoch": 1.2966304807556746,
669
+ "grad_norm": 11.092299461364746,
670
+ "learning_rate": 8.956496519721578e-06,
671
+ "loss": 0.770924072265625,
672
+ "step": 2300
673
+ },
674
+ {
675
+ "epoch": 1.310728887635697,
676
+ "grad_norm": 11.600993156433105,
677
+ "learning_rate": 8.941995359628771e-06,
678
+ "loss": 0.8001760864257812,
679
+ "step": 2325
680
+ },
681
+ {
682
+ "epoch": 1.3248272945157198,
683
+ "grad_norm": 9.988462448120117,
684
+ "learning_rate": 8.927494199535964e-06,
685
+ "loss": 0.736719970703125,
686
+ "step": 2350
687
+ },
688
+ {
689
+ "epoch": 1.3389257013957423,
690
+ "grad_norm": 11.111364364624023,
691
+ "learning_rate": 8.912993039443157e-06,
692
+ "loss": 0.7216539001464843,
693
+ "step": 2375
694
+ },
695
+ {
696
+ "epoch": 1.3530241082757648,
697
+ "grad_norm": 12.4348726272583,
698
+ "learning_rate": 8.898491879350348e-06,
699
+ "loss": 0.8057781219482422,
700
+ "step": 2400
701
+ },
702
+ {
703
+ "epoch": 1.3671225151557875,
704
+ "grad_norm": 11.37057876586914,
705
+ "learning_rate": 8.883990719257542e-06,
706
+ "loss": 0.7377596282958985,
707
+ "step": 2425
708
+ },
709
+ {
710
+ "epoch": 1.3812209220358098,
711
+ "grad_norm": 8.443863868713379,
712
+ "learning_rate": 8.869489559164733e-06,
713
+ "loss": 0.764500732421875,
714
+ "step": 2450
715
+ },
716
+ {
717
+ "epoch": 1.3953193289158325,
718
+ "grad_norm": 11.640331268310547,
719
+ "learning_rate": 8.854988399071927e-06,
720
+ "loss": 0.7854479217529297,
721
+ "step": 2475
722
+ },
723
+ {
724
+ "epoch": 1.409417735795855,
725
+ "grad_norm": 9.915708541870117,
726
+ "learning_rate": 8.840487238979118e-06,
727
+ "loss": 0.7995922088623046,
728
+ "step": 2500
729
+ },
730
+ {
731
+ "epoch": 1.4235161426758776,
732
+ "grad_norm": 12.385272979736328,
733
+ "learning_rate": 8.825986078886311e-06,
734
+ "loss": 0.7540443420410157,
735
+ "step": 2525
736
+ },
737
+ {
738
+ "epoch": 1.4376145495559003,
739
+ "grad_norm": 10.713929176330566,
740
+ "learning_rate": 8.811484918793504e-06,
741
+ "loss": 0.7403028869628906,
742
+ "step": 2550
743
+ },
744
+ {
745
+ "epoch": 1.4517129564359228,
746
+ "grad_norm": 10.878042221069336,
747
+ "learning_rate": 8.796983758700697e-06,
748
+ "loss": 0.7737533569335937,
749
+ "step": 2575
750
+ },
751
+ {
752
+ "epoch": 1.4658113633159453,
753
+ "grad_norm": 9.902074813842773,
754
+ "learning_rate": 8.78248259860789e-06,
755
+ "loss": 0.752545166015625,
756
+ "step": 2600
757
+ },
758
+ {
759
+ "epoch": 1.4799097701959678,
760
+ "grad_norm": 9.259173393249512,
761
+ "learning_rate": 8.767981438515082e-06,
762
+ "loss": 0.7878742980957031,
763
+ "step": 2625
764
+ },
765
+ {
766
+ "epoch": 1.4940081770759903,
767
+ "grad_norm": 11.374395370483398,
768
+ "learning_rate": 8.753480278422275e-06,
769
+ "loss": 0.7678947448730469,
770
+ "step": 2650
771
+ },
772
+ {
773
+ "epoch": 1.508106583956013,
774
+ "grad_norm": 10.379016876220703,
775
+ "learning_rate": 8.738979118329467e-06,
776
+ "loss": 0.7748676300048828,
777
+ "step": 2675
778
+ },
779
+ {
780
+ "epoch": 1.5222049908360356,
781
+ "grad_norm": 11.049686431884766,
782
+ "learning_rate": 8.72447795823666e-06,
783
+ "loss": 0.7766862487792969,
784
+ "step": 2700
785
+ },
786
+ {
787
+ "epoch": 1.536303397716058,
788
+ "grad_norm": 10.903966903686523,
789
+ "learning_rate": 8.709976798143853e-06,
790
+ "loss": 0.7612065124511719,
791
+ "step": 2725
792
+ },
793
+ {
794
+ "epoch": 1.5504018045960808,
795
+ "grad_norm": 11.366527557373047,
796
+ "learning_rate": 8.695475638051046e-06,
797
+ "loss": 0.7193534088134765,
798
+ "step": 2750
799
+ },
800
+ {
801
+ "epoch": 1.564500211476103,
802
+ "grad_norm": 9.587455749511719,
803
+ "learning_rate": 8.680974477958237e-06,
804
+ "loss": 0.6888518524169922,
805
+ "step": 2775
806
+ },
807
+ {
808
+ "epoch": 1.5785986183561258,
809
+ "grad_norm": 10.296839714050293,
810
+ "learning_rate": 8.666473317865431e-06,
811
+ "loss": 0.7584939575195313,
812
+ "step": 2800
813
+ },
814
+ {
815
+ "epoch": 1.5926970252361483,
816
+ "grad_norm": 13.031759262084961,
817
+ "learning_rate": 8.651972157772622e-06,
818
+ "loss": 0.8102165985107422,
819
+ "step": 2825
820
+ },
821
+ {
822
+ "epoch": 1.6067954321161708,
823
+ "grad_norm": 10.904520988464355,
824
+ "learning_rate": 8.637470997679815e-06,
825
+ "loss": 0.7805465698242188,
826
+ "step": 2850
827
+ },
828
+ {
829
+ "epoch": 1.6208938389961935,
830
+ "grad_norm": 9.31521224975586,
831
+ "learning_rate": 8.622969837587007e-06,
832
+ "loss": 0.7300022125244141,
833
+ "step": 2875
834
+ },
835
+ {
836
+ "epoch": 1.6349922458762158,
837
+ "grad_norm": 11.240145683288574,
838
+ "learning_rate": 8.6084686774942e-06,
839
+ "loss": 0.783562240600586,
840
+ "step": 2900
841
+ },
842
+ {
843
+ "epoch": 1.6490906527562386,
844
+ "grad_norm": 10.595625877380371,
845
+ "learning_rate": 8.593967517401393e-06,
846
+ "loss": 0.7286166381835938,
847
+ "step": 2925
848
+ },
849
+ {
850
+ "epoch": 1.663189059636261,
851
+ "grad_norm": 9.539634704589844,
852
+ "learning_rate": 8.579466357308586e-06,
853
+ "loss": 0.7374664306640625,
854
+ "step": 2950
855
+ },
856
+ {
857
+ "epoch": 1.6772874665162836,
858
+ "grad_norm": 11.489903450012207,
859
+ "learning_rate": 8.564965197215778e-06,
860
+ "loss": 0.7891261291503906,
861
+ "step": 2975
862
+ },
863
+ {
864
+ "epoch": 1.6913858733963063,
865
+ "grad_norm": 11.152668952941895,
866
+ "learning_rate": 8.550464037122971e-06,
867
+ "loss": 0.750862808227539,
868
+ "step": 3000
869
+ },
870
+ {
871
+ "epoch": 1.6913858733963063,
872
+ "eval_loss": 0.2470168024301529,
873
+ "eval_runtime": 1517.4032,
874
+ "eval_samples_per_second": 2.042,
875
+ "eval_steps_per_second": 0.256,
876
+ "eval_wer": 0.21129884793317413,
877
+ "step": 3000
878
+ },
879
+ {
880
+ "epoch": 1.7054842802763288,
881
+ "grad_norm": 10.490551948547363,
882
+ "learning_rate": 8.535962877030162e-06,
883
+ "loss": 0.7084814453125,
884
+ "step": 3025
885
+ },
886
+ {
887
+ "epoch": 1.7195826871563513,
888
+ "grad_norm": 11.868510246276855,
889
+ "learning_rate": 8.521461716937356e-06,
890
+ "loss": 0.7457318115234375,
891
+ "step": 3050
892
+ },
893
+ {
894
+ "epoch": 1.733681094036374,
895
+ "grad_norm": 9.93341064453125,
896
+ "learning_rate": 8.506960556844547e-06,
897
+ "loss": 0.7481878662109375,
898
+ "step": 3075
899
+ },
900
+ {
901
+ "epoch": 1.7477795009163963,
902
+ "grad_norm": 9.801637649536133,
903
+ "learning_rate": 8.492459396751742e-06,
904
+ "loss": 0.7577263641357422,
905
+ "step": 3100
906
+ },
907
+ {
908
+ "epoch": 1.761877907796419,
909
+ "grad_norm": 9.962018013000488,
910
+ "learning_rate": 8.477958236658933e-06,
911
+ "loss": 0.7176610565185547,
912
+ "step": 3125
913
+ },
914
+ {
915
+ "epoch": 1.7759763146764416,
916
+ "grad_norm": 10.016621589660645,
917
+ "learning_rate": 8.463457076566126e-06,
918
+ "loss": 0.8129973602294922,
919
+ "step": 3150
920
+ },
921
+ {
922
+ "epoch": 1.790074721556464,
923
+ "grad_norm": 10.95247745513916,
924
+ "learning_rate": 8.448955916473318e-06,
925
+ "loss": 0.76224609375,
926
+ "step": 3175
927
+ },
928
+ {
929
+ "epoch": 1.8041731284364868,
930
+ "grad_norm": 10.412908554077148,
931
+ "learning_rate": 8.434454756380511e-06,
932
+ "loss": 0.72124267578125,
933
+ "step": 3200
934
+ },
935
+ {
936
+ "epoch": 1.818271535316509,
937
+ "grad_norm": 10.183904647827148,
938
+ "learning_rate": 8.419953596287704e-06,
939
+ "loss": 0.7625586700439453,
940
+ "step": 3225
941
+ },
942
+ {
943
+ "epoch": 1.8323699421965318,
944
+ "grad_norm": 10.891714096069336,
945
+ "learning_rate": 8.405452436194896e-06,
946
+ "loss": 0.719871826171875,
947
+ "step": 3250
948
+ },
949
+ {
950
+ "epoch": 1.8464683490765543,
951
+ "grad_norm": 10.172575950622559,
952
+ "learning_rate": 8.390951276102089e-06,
953
+ "loss": 0.753576889038086,
954
+ "step": 3275
955
+ },
956
+ {
957
+ "epoch": 1.8605667559565768,
958
+ "grad_norm": 9.48585033416748,
959
+ "learning_rate": 8.376450116009282e-06,
960
+ "loss": 0.7430252838134765,
961
+ "step": 3300
962
+ },
963
+ {
964
+ "epoch": 1.8746651628365996,
965
+ "grad_norm": 11.499151229858398,
966
+ "learning_rate": 8.361948955916473e-06,
967
+ "loss": 0.7010813903808594,
968
+ "step": 3325
969
+ },
970
+ {
971
+ "epoch": 1.888763569716622,
972
+ "grad_norm": 9.432136535644531,
973
+ "learning_rate": 8.347447795823667e-06,
974
+ "loss": 0.71728759765625,
975
+ "step": 3350
976
+ },
977
+ {
978
+ "epoch": 1.9028619765966446,
979
+ "grad_norm": 8.735350608825684,
980
+ "learning_rate": 8.332946635730858e-06,
981
+ "loss": 0.8144132232666016,
982
+ "step": 3375
983
+ },
984
+ {
985
+ "epoch": 1.916960383476667,
986
+ "grad_norm": 9.190115928649902,
987
+ "learning_rate": 8.318445475638051e-06,
988
+ "loss": 0.7769715881347656,
989
+ "step": 3400
990
+ },
991
+ {
992
+ "epoch": 1.9310587903566896,
993
+ "grad_norm": 8.803107261657715,
994
+ "learning_rate": 8.303944315545245e-06,
995
+ "loss": 0.7007711791992187,
996
+ "step": 3425
997
+ },
998
+ {
999
+ "epoch": 1.9451571972367123,
1000
+ "grad_norm": 10.293452262878418,
1001
+ "learning_rate": 8.289443155452436e-06,
1002
+ "loss": 0.7087128448486328,
1003
+ "step": 3450
1004
+ },
1005
+ {
1006
+ "epoch": 1.9592556041167348,
1007
+ "grad_norm": 11.334561347961426,
1008
+ "learning_rate": 8.27494199535963e-06,
1009
+ "loss": 0.7811639404296875,
1010
+ "step": 3475
1011
+ },
1012
+ {
1013
+ "epoch": 1.9733540109967573,
1014
+ "grad_norm": 9.957331657409668,
1015
+ "learning_rate": 8.260440835266822e-06,
1016
+ "loss": 0.7865208435058594,
1017
+ "step": 3500
1018
+ },
1019
+ {
1020
+ "epoch": 1.98745241787678,
1021
+ "grad_norm": 12.995558738708496,
1022
+ "learning_rate": 8.245939675174015e-06,
1023
+ "loss": 0.7895949554443359,
1024
+ "step": 3525
1025
+ },
1026
+ {
1027
+ "epoch": 2.0011278725504016,
1028
+ "grad_norm": 9.675638198852539,
1029
+ "learning_rate": 8.231438515081207e-06,
1030
+ "loss": 0.721746826171875,
1031
+ "step": 3550
1032
+ },
1033
+ {
1034
+ "epoch": 2.0152262794304243,
1035
+ "grad_norm": 18.80123519897461,
1036
+ "learning_rate": 8.2169373549884e-06,
1037
+ "loss": 0.557889175415039,
1038
+ "step": 3575
1039
+ },
1040
+ {
1041
+ "epoch": 2.029324686310447,
1042
+ "grad_norm": 9.540290832519531,
1043
+ "learning_rate": 8.202436194895593e-06,
1044
+ "loss": 0.5269033432006835,
1045
+ "step": 3600
1046
+ },
1047
+ {
1048
+ "epoch": 2.0434230931904693,
1049
+ "grad_norm": 8.743149757385254,
1050
+ "learning_rate": 8.187935034802785e-06,
1051
+ "loss": 0.5314161300659179,
1052
+ "step": 3625
1053
+ },
1054
+ {
1055
+ "epoch": 2.057521500070492,
1056
+ "grad_norm": 8.326909065246582,
1057
+ "learning_rate": 8.173433874709976e-06,
1058
+ "loss": 0.5158148193359375,
1059
+ "step": 3650
1060
+ },
1061
+ {
1062
+ "epoch": 2.071619906950515,
1063
+ "grad_norm": 8.55147933959961,
1064
+ "learning_rate": 8.158932714617171e-06,
1065
+ "loss": 0.543358497619629,
1066
+ "step": 3675
1067
+ },
1068
+ {
1069
+ "epoch": 2.085718313830537,
1070
+ "grad_norm": 9.927057266235352,
1071
+ "learning_rate": 8.144431554524362e-06,
1072
+ "loss": 0.5452922058105468,
1073
+ "step": 3700
1074
+ },
1075
+ {
1076
+ "epoch": 2.09981672071056,
1077
+ "grad_norm": 7.827384948730469,
1078
+ "learning_rate": 8.129930394431556e-06,
1079
+ "loss": 0.5253535079956054,
1080
+ "step": 3725
1081
+ },
1082
+ {
1083
+ "epoch": 2.113915127590582,
1084
+ "grad_norm": 9.569947242736816,
1085
+ "learning_rate": 8.115429234338747e-06,
1086
+ "loss": 0.5380427169799805,
1087
+ "step": 3750
1088
+ },
1089
+ {
1090
+ "epoch": 2.128013534470605,
1091
+ "grad_norm": 8.02529525756836,
1092
+ "learning_rate": 8.10092807424594e-06,
1093
+ "loss": 0.50870849609375,
1094
+ "step": 3775
1095
+ },
1096
+ {
1097
+ "epoch": 2.1421119413506275,
1098
+ "grad_norm": 10.682024002075195,
1099
+ "learning_rate": 8.086426914153133e-06,
1100
+ "loss": 0.5357696914672851,
1101
+ "step": 3800
1102
+ },
1103
+ {
1104
+ "epoch": 2.15621034823065,
1105
+ "grad_norm": 7.825737476348877,
1106
+ "learning_rate": 8.071925754060325e-06,
1107
+ "loss": 0.5354624176025391,
1108
+ "step": 3825
1109
+ },
1110
+ {
1111
+ "epoch": 2.1703087551106726,
1112
+ "grad_norm": 8.716205596923828,
1113
+ "learning_rate": 8.057424593967518e-06,
1114
+ "loss": 0.5709107208251953,
1115
+ "step": 3850
1116
+ },
1117
+ {
1118
+ "epoch": 2.184407161990695,
1119
+ "grad_norm": 8.497817993164062,
1120
+ "learning_rate": 8.042923433874711e-06,
1121
+ "loss": 0.559893798828125,
1122
+ "step": 3875
1123
+ },
1124
+ {
1125
+ "epoch": 2.1985055688707176,
1126
+ "grad_norm": 10.781414985656738,
1127
+ "learning_rate": 8.028422273781904e-06,
1128
+ "loss": 0.5544267272949219,
1129
+ "step": 3900
1130
+ },
1131
+ {
1132
+ "epoch": 2.2126039757507403,
1133
+ "grad_norm": 10.681407928466797,
1134
+ "learning_rate": 8.013921113689096e-06,
1135
+ "loss": 0.533585090637207,
1136
+ "step": 3925
1137
+ },
1138
+ {
1139
+ "epoch": 2.2267023826307626,
1140
+ "grad_norm": 9.05926513671875,
1141
+ "learning_rate": 7.999419953596287e-06,
1142
+ "loss": 0.5441395950317383,
1143
+ "step": 3950
1144
+ },
1145
+ {
1146
+ "epoch": 2.2408007895107853,
1147
+ "grad_norm": 10.925969123840332,
1148
+ "learning_rate": 7.984918793503482e-06,
1149
+ "loss": 0.5743826675415039,
1150
+ "step": 3975
1151
+ },
1152
+ {
1153
+ "epoch": 2.2548991963908076,
1154
+ "grad_norm": 9.930057525634766,
1155
+ "learning_rate": 7.970417633410673e-06,
1156
+ "loss": 0.5230157089233398,
1157
+ "step": 4000
1158
+ },
1159
+ {
1160
+ "epoch": 2.2548991963908076,
1161
+ "eval_loss": 0.25168365240097046,
1162
+ "eval_runtime": 1521.1061,
1163
+ "eval_samples_per_second": 2.037,
1164
+ "eval_steps_per_second": 0.255,
1165
+ "eval_wer": 0.20968015907115237,
1166
+ "step": 4000
1167
+ },
1168
+ {
1169
+ "epoch": 2.2689976032708303,
1170
+ "grad_norm": 11.271224021911621,
1171
+ "learning_rate": 7.955916473317865e-06,
1172
+ "loss": 0.5519426727294922,
1173
+ "step": 4025
1174
+ },
1175
+ {
1176
+ "epoch": 2.283096010150853,
1177
+ "grad_norm": 9.23180103302002,
1178
+ "learning_rate": 7.941415313225058e-06,
1179
+ "loss": 0.5422761917114258,
1180
+ "step": 4050
1181
+ },
1182
+ {
1183
+ "epoch": 2.2971944170308753,
1184
+ "grad_norm": 8.428093910217285,
1185
+ "learning_rate": 7.926914153132251e-06,
1186
+ "loss": 0.5573156356811524,
1187
+ "step": 4075
1188
+ },
1189
+ {
1190
+ "epoch": 2.311292823910898,
1191
+ "grad_norm": 9.771265029907227,
1192
+ "learning_rate": 7.912412993039444e-06,
1193
+ "loss": 0.5444869995117188,
1194
+ "step": 4100
1195
+ },
1196
+ {
1197
+ "epoch": 2.325391230790921,
1198
+ "grad_norm": 8.109742164611816,
1199
+ "learning_rate": 7.897911832946636e-06,
1200
+ "loss": 0.565984001159668,
1201
+ "step": 4125
1202
+ },
1203
+ {
1204
+ "epoch": 2.339489637670943,
1205
+ "grad_norm": 10.035289764404297,
1206
+ "learning_rate": 7.883410672853829e-06,
1207
+ "loss": 0.5919873809814453,
1208
+ "step": 4150
1209
+ },
1210
+ {
1211
+ "epoch": 2.353588044550966,
1212
+ "grad_norm": 7.981551647186279,
1213
+ "learning_rate": 7.868909512761022e-06,
1214
+ "loss": 0.5728730392456055,
1215
+ "step": 4175
1216
+ },
1217
+ {
1218
+ "epoch": 2.3676864514309885,
1219
+ "grad_norm": 9.20626163482666,
1220
+ "learning_rate": 7.854408352668213e-06,
1221
+ "loss": 0.5345810317993164,
1222
+ "step": 4200
1223
+ },
1224
+ {
1225
+ "epoch": 2.381784858311011,
1226
+ "grad_norm": 10.04966926574707,
1227
+ "learning_rate": 7.839907192575407e-06,
1228
+ "loss": 0.5514390563964844,
1229
+ "step": 4225
1230
+ },
1231
+ {
1232
+ "epoch": 2.3958832651910336,
1233
+ "grad_norm": 9.066740989685059,
1234
+ "learning_rate": 7.8254060324826e-06,
1235
+ "loss": 0.5442893600463867,
1236
+ "step": 4250
1237
+ },
1238
+ {
1239
+ "epoch": 2.409981672071056,
1240
+ "grad_norm": 8.016524314880371,
1241
+ "learning_rate": 7.810904872389791e-06,
1242
+ "loss": 0.5711633682250976,
1243
+ "step": 4275
1244
+ },
1245
+ {
1246
+ "epoch": 2.4240800789510786,
1247
+ "grad_norm": 9.421856880187988,
1248
+ "learning_rate": 7.796403712296985e-06,
1249
+ "loss": 0.5826901626586914,
1250
+ "step": 4300
1251
+ },
1252
+ {
1253
+ "epoch": 2.4381784858311013,
1254
+ "grad_norm": 9.801589965820312,
1255
+ "learning_rate": 7.781902552204176e-06,
1256
+ "loss": 0.5632424545288086,
1257
+ "step": 4325
1258
+ },
1259
+ {
1260
+ "epoch": 2.4522768927111236,
1261
+ "grad_norm": 7.847035884857178,
1262
+ "learning_rate": 7.76740139211137e-06,
1263
+ "loss": 0.5748075485229492,
1264
+ "step": 4350
1265
+ },
1266
+ {
1267
+ "epoch": 2.4663752995911463,
1268
+ "grad_norm": 8.85059928894043,
1269
+ "learning_rate": 7.752900232018562e-06,
1270
+ "loss": 0.5258598327636719,
1271
+ "step": 4375
1272
+ },
1273
+ {
1274
+ "epoch": 2.4804737064711686,
1275
+ "grad_norm": 9.43190860748291,
1276
+ "learning_rate": 7.738399071925755e-06,
1277
+ "loss": 0.5532307815551758,
1278
+ "step": 4400
1279
+ },
1280
+ {
1281
+ "epoch": 2.4945721133511913,
1282
+ "grad_norm": 8.644095420837402,
1283
+ "learning_rate": 7.723897911832947e-06,
1284
+ "loss": 0.5246799850463867,
1285
+ "step": 4425
1286
+ },
1287
+ {
1288
+ "epoch": 2.508670520231214,
1289
+ "grad_norm": 10.166682243347168,
1290
+ "learning_rate": 7.70939675174014e-06,
1291
+ "loss": 0.5615069198608399,
1292
+ "step": 4450
1293
+ },
1294
+ {
1295
+ "epoch": 2.5227689271112363,
1296
+ "grad_norm": 8.29073429107666,
1297
+ "learning_rate": 7.694895591647333e-06,
1298
+ "loss": 0.5665618133544922,
1299
+ "step": 4475
1300
+ },
1301
+ {
1302
+ "epoch": 2.536867333991259,
1303
+ "grad_norm": 11.046154975891113,
1304
+ "learning_rate": 7.680394431554525e-06,
1305
+ "loss": 0.5597576904296875,
1306
+ "step": 4500
1307
+ },
1308
+ {
1309
+ "epoch": 2.5509657408712814,
1310
+ "grad_norm": 8.011266708374023,
1311
+ "learning_rate": 7.665893271461718e-06,
1312
+ "loss": 0.5483290863037109,
1313
+ "step": 4525
1314
+ },
1315
+ {
1316
+ "epoch": 2.565064147751304,
1317
+ "grad_norm": 8.506767272949219,
1318
+ "learning_rate": 7.65139211136891e-06,
1319
+ "loss": 0.5561467361450195,
1320
+ "step": 4550
1321
+ },
1322
+ {
1323
+ "epoch": 2.579162554631327,
1324
+ "grad_norm": 8.871068954467773,
1325
+ "learning_rate": 7.636890951276102e-06,
1326
+ "loss": 0.581519546508789,
1327
+ "step": 4575
1328
+ },
1329
+ {
1330
+ "epoch": 2.593260961511349,
1331
+ "grad_norm": 7.879359722137451,
1332
+ "learning_rate": 7.622389791183295e-06,
1333
+ "loss": 0.5407870483398437,
1334
+ "step": 4600
1335
+ },
1336
+ {
1337
+ "epoch": 2.607359368391372,
1338
+ "grad_norm": 6.408371448516846,
1339
+ "learning_rate": 7.607888631090487e-06,
1340
+ "loss": 0.5560993576049804,
1341
+ "step": 4625
1342
+ },
1343
+ {
1344
+ "epoch": 2.621457775271394,
1345
+ "grad_norm": 8.178828239440918,
1346
+ "learning_rate": 7.593387470997681e-06,
1347
+ "loss": 0.5660905838012695,
1348
+ "step": 4650
1349
+ },
1350
+ {
1351
+ "epoch": 2.635556182151417,
1352
+ "grad_norm": 8.56795597076416,
1353
+ "learning_rate": 7.578886310904873e-06,
1354
+ "loss": 0.5393736648559571,
1355
+ "step": 4675
1356
+ },
1357
+ {
1358
+ "epoch": 2.6496545890314396,
1359
+ "grad_norm": 8.56303596496582,
1360
+ "learning_rate": 7.564385150812066e-06,
1361
+ "loss": 0.5549613952636718,
1362
+ "step": 4700
1363
+ },
1364
+ {
1365
+ "epoch": 2.663752995911462,
1366
+ "grad_norm": 8.995079040527344,
1367
+ "learning_rate": 7.549883990719258e-06,
1368
+ "loss": 0.5543619537353516,
1369
+ "step": 4725
1370
+ },
1371
+ {
1372
+ "epoch": 2.6778514027914846,
1373
+ "grad_norm": 9.455941200256348,
1374
+ "learning_rate": 7.535382830626451e-06,
1375
+ "loss": 0.5422988510131836,
1376
+ "step": 4750
1377
+ },
1378
+ {
1379
+ "epoch": 2.691949809671507,
1380
+ "grad_norm": 9.779569625854492,
1381
+ "learning_rate": 7.520881670533643e-06,
1382
+ "loss": 0.5424752044677734,
1383
+ "step": 4775
1384
+ },
1385
+ {
1386
+ "epoch": 2.7060482165515296,
1387
+ "grad_norm": 9.737711906433105,
1388
+ "learning_rate": 7.506380510440836e-06,
1389
+ "loss": 0.5491796112060547,
1390
+ "step": 4800
1391
+ },
1392
+ {
1393
+ "epoch": 2.7201466234315523,
1394
+ "grad_norm": 10.25398063659668,
1395
+ "learning_rate": 7.491879350348028e-06,
1396
+ "loss": 0.5705435180664062,
1397
+ "step": 4825
1398
+ },
1399
+ {
1400
+ "epoch": 2.734245030311575,
1401
+ "grad_norm": 9.573110580444336,
1402
+ "learning_rate": 7.477378190255221e-06,
1403
+ "loss": 0.5880419540405274,
1404
+ "step": 4850
1405
+ },
1406
+ {
1407
+ "epoch": 2.7483434371915973,
1408
+ "grad_norm": 11.062814712524414,
1409
+ "learning_rate": 7.4628770301624135e-06,
1410
+ "loss": 0.556083984375,
1411
+ "step": 4875
1412
+ },
1413
+ {
1414
+ "epoch": 2.7624418440716196,
1415
+ "grad_norm": 9.247238159179688,
1416
+ "learning_rate": 7.448375870069606e-06,
1417
+ "loss": 0.5655975341796875,
1418
+ "step": 4900
1419
+ },
1420
+ {
1421
+ "epoch": 2.7765402509516424,
1422
+ "grad_norm": 9.007148742675781,
1423
+ "learning_rate": 7.433874709976798e-06,
1424
+ "loss": 0.5616880035400391,
1425
+ "step": 4925
1426
+ },
1427
+ {
1428
+ "epoch": 2.790638657831665,
1429
+ "grad_norm": 8.514342308044434,
1430
+ "learning_rate": 7.419373549883992e-06,
1431
+ "loss": 0.5538288116455078,
1432
+ "step": 4950
1433
+ },
1434
+ {
1435
+ "epoch": 2.804737064711688,
1436
+ "grad_norm": 9.440685272216797,
1437
+ "learning_rate": 7.4048723897911835e-06,
1438
+ "loss": 0.5948199081420898,
1439
+ "step": 4975
1440
+ },
1441
+ {
1442
+ "epoch": 2.81883547159171,
1443
+ "grad_norm": 6.956933975219727,
1444
+ "learning_rate": 7.390371229698376e-06,
1445
+ "loss": 0.5486139678955078,
1446
+ "step": 5000
1447
+ },
1448
+ {
1449
+ "epoch": 2.81883547159171,
1450
+ "eval_loss": 0.24652095139026642,
1451
+ "eval_runtime": 1532.9367,
1452
+ "eval_samples_per_second": 2.022,
1453
+ "eval_steps_per_second": 0.253,
1454
+ "eval_wer": 0.20783165635834974,
1455
+ "step": 5000
1456
+ },
1457
+ {
1458
+ "epoch": 2.832933878471733,
1459
+ "grad_norm": 9.058638572692871,
1460
+ "learning_rate": 7.375870069605568e-06,
1461
+ "loss": 0.5798805999755859,
1462
+ "step": 5025
1463
+ },
1464
+ {
1465
+ "epoch": 2.847032285351755,
1466
+ "grad_norm": 7.628641605377197,
1467
+ "learning_rate": 7.361368909512762e-06,
1468
+ "loss": 0.5301958847045899,
1469
+ "step": 5050
1470
+ },
1471
+ {
1472
+ "epoch": 2.861130692231778,
1473
+ "grad_norm": 8.487007141113281,
1474
+ "learning_rate": 7.346867749419954e-06,
1475
+ "loss": 0.566702880859375,
1476
+ "step": 5075
1477
+ },
1478
+ {
1479
+ "epoch": 2.8752290991118006,
1480
+ "grad_norm": 9.008893013000488,
1481
+ "learning_rate": 7.332366589327147e-06,
1482
+ "loss": 0.564128532409668,
1483
+ "step": 5100
1484
+ },
1485
+ {
1486
+ "epoch": 2.889327505991823,
1487
+ "grad_norm": 11.27266788482666,
1488
+ "learning_rate": 7.31786542923434e-06,
1489
+ "loss": 0.5648199462890625,
1490
+ "step": 5125
1491
+ },
1492
+ {
1493
+ "epoch": 2.9034259128718456,
1494
+ "grad_norm": 9.269525527954102,
1495
+ "learning_rate": 7.303364269141532e-06,
1496
+ "loss": 0.5541753005981446,
1497
+ "step": 5150
1498
+ },
1499
+ {
1500
+ "epoch": 2.917524319751868,
1501
+ "grad_norm": 9.549739837646484,
1502
+ "learning_rate": 7.288863109048725e-06,
1503
+ "loss": 0.5463447952270508,
1504
+ "step": 5175
1505
+ },
1506
+ {
1507
+ "epoch": 2.9316227266318906,
1508
+ "grad_norm": 9.327741622924805,
1509
+ "learning_rate": 7.274361948955917e-06,
1510
+ "loss": 0.5498114395141601,
1511
+ "step": 5200
1512
+ },
1513
+ {
1514
+ "epoch": 2.9457211335119133,
1515
+ "grad_norm": 10.359440803527832,
1516
+ "learning_rate": 7.25986078886311e-06,
1517
+ "loss": 0.5578446578979492,
1518
+ "step": 5225
1519
+ },
1520
+ {
1521
+ "epoch": 2.9598195403919356,
1522
+ "grad_norm": 7.895180702209473,
1523
+ "learning_rate": 7.245359628770302e-06,
1524
+ "loss": 0.5610840606689453,
1525
+ "step": 5250
1526
+ },
1527
+ {
1528
+ "epoch": 2.9739179472719584,
1529
+ "grad_norm": 11.096906661987305,
1530
+ "learning_rate": 7.230858468677495e-06,
1531
+ "loss": 0.5936727523803711,
1532
+ "step": 5275
1533
+ },
1534
+ {
1535
+ "epoch": 2.9880163541519806,
1536
+ "grad_norm": 10.793472290039062,
1537
+ "learning_rate": 7.216357308584687e-06,
1538
+ "loss": 0.5792999649047852,
1539
+ "step": 5300
1540
+ },
1541
+ {
1542
+ "epoch": 3.0016918088256026,
1543
+ "grad_norm": 8.622193336486816,
1544
+ "learning_rate": 7.201856148491881e-06,
1545
+ "loss": 0.5246665573120117,
1546
+ "step": 5325
1547
+ },
1548
+ {
1549
+ "epoch": 3.0157902157056253,
1550
+ "grad_norm": 6.704568386077881,
1551
+ "learning_rate": 7.1873549883990726e-06,
1552
+ "loss": 0.40513866424560546,
1553
+ "step": 5350
1554
+ },
1555
+ {
1556
+ "epoch": 3.0298886225856476,
1557
+ "grad_norm": 7.683806419372559,
1558
+ "learning_rate": 7.172853828306265e-06,
1559
+ "loss": 0.3967144775390625,
1560
+ "step": 5375
1561
+ },
1562
+ {
1563
+ "epoch": 3.0439870294656703,
1564
+ "grad_norm": 7.537134647369385,
1565
+ "learning_rate": 7.158352668213457e-06,
1566
+ "loss": 0.39253467559814453,
1567
+ "step": 5400
1568
+ },
1569
+ {
1570
+ "epoch": 3.058085436345693,
1571
+ "grad_norm": 6.951216697692871,
1572
+ "learning_rate": 7.143851508120651e-06,
1573
+ "loss": 0.374131965637207,
1574
+ "step": 5425
1575
+ },
1576
+ {
1577
+ "epoch": 3.0721838432257154,
1578
+ "grad_norm": 7.260530471801758,
1579
+ "learning_rate": 7.129350348027843e-06,
1580
+ "loss": 0.35717914581298826,
1581
+ "step": 5450
1582
+ },
1583
+ {
1584
+ "epoch": 3.086282250105738,
1585
+ "grad_norm": 7.6884989738464355,
1586
+ "learning_rate": 7.114849187935035e-06,
1587
+ "loss": 0.35510223388671874,
1588
+ "step": 5475
1589
+ },
1590
+ {
1591
+ "epoch": 3.100380656985761,
1592
+ "grad_norm": 7.6629109382629395,
1593
+ "learning_rate": 7.100348027842228e-06,
1594
+ "loss": 0.35819530487060547,
1595
+ "step": 5500
1596
+ },
1597
+ {
1598
+ "epoch": 3.114479063865783,
1599
+ "grad_norm": 7.406759262084961,
1600
+ "learning_rate": 7.085846867749421e-06,
1601
+ "loss": 0.3919057846069336,
1602
+ "step": 5525
1603
+ },
1604
+ {
1605
+ "epoch": 3.128577470745806,
1606
+ "grad_norm": 8.94927978515625,
1607
+ "learning_rate": 7.071345707656613e-06,
1608
+ "loss": 0.3659718704223633,
1609
+ "step": 5550
1610
+ },
1611
+ {
1612
+ "epoch": 3.142675877625828,
1613
+ "grad_norm": 8.768269538879395,
1614
+ "learning_rate": 7.056844547563806e-06,
1615
+ "loss": 0.37319766998291015,
1616
+ "step": 5575
1617
+ },
1618
+ {
1619
+ "epoch": 3.156774284505851,
1620
+ "grad_norm": 8.374007225036621,
1621
+ "learning_rate": 7.042343387470998e-06,
1622
+ "loss": 0.3749269104003906,
1623
+ "step": 5600
1624
+ },
1625
+ {
1626
+ "epoch": 3.1708726913858736,
1627
+ "grad_norm": 6.927631855010986,
1628
+ "learning_rate": 7.027842227378191e-06,
1629
+ "loss": 0.3902094650268555,
1630
+ "step": 5625
1631
+ },
1632
+ {
1633
+ "epoch": 3.184971098265896,
1634
+ "grad_norm": 8.199101448059082,
1635
+ "learning_rate": 7.013341067285383e-06,
1636
+ "loss": 0.3912439727783203,
1637
+ "step": 5650
1638
+ },
1639
+ {
1640
+ "epoch": 3.1990695051459186,
1641
+ "grad_norm": 6.719486713409424,
1642
+ "learning_rate": 6.998839907192576e-06,
1643
+ "loss": 0.36518966674804687,
1644
+ "step": 5675
1645
+ },
1646
+ {
1647
+ "epoch": 3.213167912025941,
1648
+ "grad_norm": 8.620352745056152,
1649
+ "learning_rate": 6.984338747099768e-06,
1650
+ "loss": 0.3811537551879883,
1651
+ "step": 5700
1652
+ },
1653
+ {
1654
+ "epoch": 3.2272663189059636,
1655
+ "grad_norm": 7.2147040367126465,
1656
+ "learning_rate": 6.969837587006962e-06,
1657
+ "loss": 0.37800453186035154,
1658
+ "step": 5725
1659
+ },
1660
+ {
1661
+ "epoch": 3.2413647257859863,
1662
+ "grad_norm": 8.890436172485352,
1663
+ "learning_rate": 6.9553364269141535e-06,
1664
+ "loss": 0.3713486480712891,
1665
+ "step": 5750
1666
+ },
1667
+ {
1668
+ "epoch": 3.2554631326660086,
1669
+ "grad_norm": 8.093574523925781,
1670
+ "learning_rate": 6.940835266821346e-06,
1671
+ "loss": 0.4041537857055664,
1672
+ "step": 5775
1673
+ },
1674
+ {
1675
+ "epoch": 3.2695615395460313,
1676
+ "grad_norm": 9.059134483337402,
1677
+ "learning_rate": 6.926334106728538e-06,
1678
+ "loss": 0.397373046875,
1679
+ "step": 5800
1680
+ },
1681
+ {
1682
+ "epoch": 3.283659946426054,
1683
+ "grad_norm": 9.615391731262207,
1684
+ "learning_rate": 6.911832946635732e-06,
1685
+ "loss": 0.3877538299560547,
1686
+ "step": 5825
1687
+ },
1688
+ {
1689
+ "epoch": 3.2977583533060764,
1690
+ "grad_norm": 7.0843186378479,
1691
+ "learning_rate": 6.8973317865429235e-06,
1692
+ "loss": 0.38612773895263675,
1693
+ "step": 5850
1694
+ },
1695
+ {
1696
+ "epoch": 3.311856760186099,
1697
+ "grad_norm": 7.724761009216309,
1698
+ "learning_rate": 6.882830626450116e-06,
1699
+ "loss": 0.3913743209838867,
1700
+ "step": 5875
1701
+ },
1702
+ {
1703
+ "epoch": 3.3259551670661214,
1704
+ "grad_norm": 7.574341297149658,
1705
+ "learning_rate": 6.86832946635731e-06,
1706
+ "loss": 0.36103832244873046,
1707
+ "step": 5900
1708
+ },
1709
+ {
1710
+ "epoch": 3.340053573946144,
1711
+ "grad_norm": 9.554155349731445,
1712
+ "learning_rate": 6.853828306264502e-06,
1713
+ "loss": 0.39909862518310546,
1714
+ "step": 5925
1715
+ },
1716
+ {
1717
+ "epoch": 3.354151980826167,
1718
+ "grad_norm": 9.424493789672852,
1719
+ "learning_rate": 6.839327146171695e-06,
1720
+ "loss": 0.36129764556884764,
1721
+ "step": 5950
1722
+ },
1723
+ {
1724
+ "epoch": 3.368250387706189,
1725
+ "grad_norm": 8.635687828063965,
1726
+ "learning_rate": 6.824825986078887e-06,
1727
+ "loss": 0.4005467987060547,
1728
+ "step": 5975
1729
+ },
1730
+ {
1731
+ "epoch": 3.382348794586212,
1732
+ "grad_norm": 8.711161613464355,
1733
+ "learning_rate": 6.81032482598608e-06,
1734
+ "loss": 0.37857559204101565,
1735
+ "step": 6000
1736
+ },
1737
+ {
1738
+ "epoch": 3.382348794586212,
1739
+ "eval_loss": 0.2660383880138397,
1740
+ "eval_runtime": 1540.0218,
1741
+ "eval_samples_per_second": 2.012,
1742
+ "eval_steps_per_second": 0.252,
1743
+ "eval_wer": 0.21214815999040776,
1744
+ "step": 6000
1745
+ },
1746
+ {
1747
+ "epoch": 3.396447201466234,
1748
+ "grad_norm": 8.4149751663208,
1749
+ "learning_rate": 6.795823665893272e-06,
1750
+ "loss": 0.38818477630615233,
1751
+ "step": 6025
1752
+ },
1753
+ {
1754
+ "epoch": 3.410545608346257,
1755
+ "grad_norm": 7.9141974449157715,
1756
+ "learning_rate": 6.781322505800465e-06,
1757
+ "loss": 0.3774140930175781,
1758
+ "step": 6050
1759
+ },
1760
+ {
1761
+ "epoch": 3.4246440152262796,
1762
+ "grad_norm": 8.901697158813477,
1763
+ "learning_rate": 6.766821345707657e-06,
1764
+ "loss": 0.37076019287109374,
1765
+ "step": 6075
1766
+ },
1767
+ {
1768
+ "epoch": 3.438742422106302,
1769
+ "grad_norm": 7.5369696617126465,
1770
+ "learning_rate": 6.75232018561485e-06,
1771
+ "loss": 0.3822758102416992,
1772
+ "step": 6100
1773
+ },
1774
+ {
1775
+ "epoch": 3.4528408289863246,
1776
+ "grad_norm": 8.009127616882324,
1777
+ "learning_rate": 6.7378190255220425e-06,
1778
+ "loss": 0.38495445251464844,
1779
+ "step": 6125
1780
+ },
1781
+ {
1782
+ "epoch": 3.466939235866347,
1783
+ "grad_norm": 8.295459747314453,
1784
+ "learning_rate": 6.723317865429235e-06,
1785
+ "loss": 0.3907606887817383,
1786
+ "step": 6150
1787
+ },
1788
+ {
1789
+ "epoch": 3.4810376427463696,
1790
+ "grad_norm": 10.099355697631836,
1791
+ "learning_rate": 6.708816705336427e-06,
1792
+ "loss": 0.407639274597168,
1793
+ "step": 6175
1794
+ },
1795
+ {
1796
+ "epoch": 3.4951360496263923,
1797
+ "grad_norm": 7.597548961639404,
1798
+ "learning_rate": 6.694315545243621e-06,
1799
+ "loss": 0.37315830230712893,
1800
+ "step": 6200
1801
+ },
1802
+ {
1803
+ "epoch": 3.5092344565064146,
1804
+ "grad_norm": 8.139415740966797,
1805
+ "learning_rate": 6.6798143851508125e-06,
1806
+ "loss": 0.3879343795776367,
1807
+ "step": 6225
1808
+ },
1809
+ {
1810
+ "epoch": 3.5233328633864374,
1811
+ "grad_norm": 8.672515869140625,
1812
+ "learning_rate": 6.665313225058005e-06,
1813
+ "loss": 0.37412925720214846,
1814
+ "step": 6250
1815
+ },
1816
+ {
1817
+ "epoch": 3.5374312702664596,
1818
+ "grad_norm": 6.852619171142578,
1819
+ "learning_rate": 6.650812064965198e-06,
1820
+ "loss": 0.3825421142578125,
1821
+ "step": 6275
1822
+ },
1823
+ {
1824
+ "epoch": 3.5515296771464824,
1825
+ "grad_norm": 7.91625452041626,
1826
+ "learning_rate": 6.636310904872391e-06,
1827
+ "loss": 0.3875956726074219,
1828
+ "step": 6300
1829
+ },
1830
+ {
1831
+ "epoch": 3.565628084026505,
1832
+ "grad_norm": 8.165389060974121,
1833
+ "learning_rate": 6.6218097447795825e-06,
1834
+ "loss": 0.39700664520263673,
1835
+ "step": 6325
1836
+ },
1837
+ {
1838
+ "epoch": 3.579726490906528,
1839
+ "grad_norm": 7.832555294036865,
1840
+ "learning_rate": 6.607308584686776e-06,
1841
+ "loss": 0.37678192138671873,
1842
+ "step": 6350
1843
+ },
1844
+ {
1845
+ "epoch": 3.59382489778655,
1846
+ "grad_norm": 9.242694854736328,
1847
+ "learning_rate": 6.592807424593968e-06,
1848
+ "loss": 0.3898345184326172,
1849
+ "step": 6375
1850
+ },
1851
+ {
1852
+ "epoch": 3.6079233046665724,
1853
+ "grad_norm": 10.566645622253418,
1854
+ "learning_rate": 6.578306264501161e-06,
1855
+ "loss": 0.41113948822021484,
1856
+ "step": 6400
1857
+ },
1858
+ {
1859
+ "epoch": 3.622021711546595,
1860
+ "grad_norm": 8.03178882598877,
1861
+ "learning_rate": 6.5638051044083525e-06,
1862
+ "loss": 0.3778879165649414,
1863
+ "step": 6425
1864
+ },
1865
+ {
1866
+ "epoch": 3.636120118426618,
1867
+ "grad_norm": 9.302940368652344,
1868
+ "learning_rate": 6.549303944315546e-06,
1869
+ "loss": 0.4171958541870117,
1870
+ "step": 6450
1871
+ },
1872
+ {
1873
+ "epoch": 3.6502185253066406,
1874
+ "grad_norm": 8.460973739624023,
1875
+ "learning_rate": 6.534802784222738e-06,
1876
+ "loss": 0.39614273071289063,
1877
+ "step": 6475
1878
+ },
1879
+ {
1880
+ "epoch": 3.664316932186663,
1881
+ "grad_norm": 7.800125598907471,
1882
+ "learning_rate": 6.520301624129931e-06,
1883
+ "loss": 0.3654580307006836,
1884
+ "step": 6500
1885
+ },
1886
+ {
1887
+ "epoch": 3.6784153390666856,
1888
+ "grad_norm": 9.029678344726562,
1889
+ "learning_rate": 6.505800464037123e-06,
1890
+ "loss": 0.37184043884277346,
1891
+ "step": 6525
1892
+ },
1893
+ {
1894
+ "epoch": 3.692513745946708,
1895
+ "grad_norm": 7.120487689971924,
1896
+ "learning_rate": 6.491299303944316e-06,
1897
+ "loss": 0.3724634552001953,
1898
+ "step": 6550
1899
+ },
1900
+ {
1901
+ "epoch": 3.7066121528267306,
1902
+ "grad_norm": 7.917516708374023,
1903
+ "learning_rate": 6.476798143851508e-06,
1904
+ "loss": 0.3739112091064453,
1905
+ "step": 6575
1906
+ },
1907
+ {
1908
+ "epoch": 3.7207105597067534,
1909
+ "grad_norm": 8.271352767944336,
1910
+ "learning_rate": 6.4622969837587015e-06,
1911
+ "loss": 0.401793212890625,
1912
+ "step": 6600
1913
+ },
1914
+ {
1915
+ "epoch": 3.7348089665867756,
1916
+ "grad_norm": 7.502166748046875,
1917
+ "learning_rate": 6.447795823665893e-06,
1918
+ "loss": 0.42757793426513674,
1919
+ "step": 6625
1920
+ },
1921
+ {
1922
+ "epoch": 3.7489073734667984,
1923
+ "grad_norm": 7.828073024749756,
1924
+ "learning_rate": 6.433294663573086e-06,
1925
+ "loss": 0.4309410095214844,
1926
+ "step": 6650
1927
+ },
1928
+ {
1929
+ "epoch": 3.7630057803468207,
1930
+ "grad_norm": 8.782448768615723,
1931
+ "learning_rate": 6.418793503480279e-06,
1932
+ "loss": 0.40100780487060544,
1933
+ "step": 6675
1934
+ },
1935
+ {
1936
+ "epoch": 3.7771041872268434,
1937
+ "grad_norm": 6.84846305847168,
1938
+ "learning_rate": 6.4042923433874715e-06,
1939
+ "loss": 0.3828923797607422,
1940
+ "step": 6700
1941
+ },
1942
+ {
1943
+ "epoch": 3.791202594106866,
1944
+ "grad_norm": 8.250489234924316,
1945
+ "learning_rate": 6.389791183294664e-06,
1946
+ "loss": 0.3788992691040039,
1947
+ "step": 6725
1948
+ },
1949
+ {
1950
+ "epoch": 3.8053010009868884,
1951
+ "grad_norm": 8.07790470123291,
1952
+ "learning_rate": 6.375290023201857e-06,
1953
+ "loss": 0.3897978210449219,
1954
+ "step": 6750
1955
+ },
1956
+ {
1957
+ "epoch": 3.819399407866911,
1958
+ "grad_norm": 8.17908000946045,
1959
+ "learning_rate": 6.36078886310905e-06,
1960
+ "loss": 0.3732691955566406,
1961
+ "step": 6775
1962
+ },
1963
+ {
1964
+ "epoch": 3.8334978147469334,
1965
+ "grad_norm": 10.077781677246094,
1966
+ "learning_rate": 6.3462877030162415e-06,
1967
+ "loss": 0.38536983489990234,
1968
+ "step": 6800
1969
+ },
1970
+ {
1971
+ "epoch": 3.847596221626956,
1972
+ "grad_norm": 7.02805757522583,
1973
+ "learning_rate": 6.331786542923435e-06,
1974
+ "loss": 0.39071887969970703,
1975
+ "step": 6825
1976
+ },
1977
+ {
1978
+ "epoch": 3.861694628506979,
1979
+ "grad_norm": 7.91436243057251,
1980
+ "learning_rate": 6.317285382830627e-06,
1981
+ "loss": 0.3588584899902344,
1982
+ "step": 6850
1983
+ },
1984
+ {
1985
+ "epoch": 3.875793035387001,
1986
+ "grad_norm": 8.013671875,
1987
+ "learning_rate": 6.30278422273782e-06,
1988
+ "loss": 0.37952312469482424,
1989
+ "step": 6875
1990
+ },
1991
+ {
1992
+ "epoch": 3.889891442267024,
1993
+ "grad_norm": 8.154217720031738,
1994
+ "learning_rate": 6.288283062645012e-06,
1995
+ "loss": 0.37390522003173826,
1996
+ "step": 6900
1997
+ },
1998
+ {
1999
+ "epoch": 3.903989849147046,
2000
+ "grad_norm": 8.916945457458496,
2001
+ "learning_rate": 6.273781902552205e-06,
2002
+ "loss": 0.38926349639892577,
2003
+ "step": 6925
2004
+ },
2005
+ {
2006
+ "epoch": 3.918088256027069,
2007
+ "grad_norm": 7.6121296882629395,
2008
+ "learning_rate": 6.259280742459397e-06,
2009
+ "loss": 0.3879304504394531,
2010
+ "step": 6950
2011
+ },
2012
+ {
2013
+ "epoch": 3.9321866629070916,
2014
+ "grad_norm": 9.87363052368164,
2015
+ "learning_rate": 6.2447795823665905e-06,
2016
+ "loss": 0.3799150466918945,
2017
+ "step": 6975
2018
+ },
2019
+ {
2020
+ "epoch": 3.946285069787114,
2021
+ "grad_norm": 6.678126335144043,
2022
+ "learning_rate": 6.230278422273782e-06,
2023
+ "loss": 0.3740867614746094,
2024
+ "step": 7000
2025
+ },
2026
+ {
2027
+ "epoch": 3.946285069787114,
2028
+ "eval_loss": 0.270298033952713,
2029
+ "eval_runtime": 1521.4464,
2030
+ "eval_samples_per_second": 2.037,
2031
+ "eval_steps_per_second": 0.255,
2032
+ "eval_wer": 0.21098909882994774,
2033
+ "step": 7000
2034
+ }
2035
+ ],
2036
+ "logging_steps": 25,
2037
+ "max_steps": 17740,
2038
+ "num_input_tokens_seen": 0,
2039
+ "num_train_epochs": 10,
2040
+ "save_steps": 1000,
2041
+ "stateful_callbacks": {
2042
+ "TrainerControl": {
2043
+ "args": {
2044
+ "should_epoch_stop": false,
2045
+ "should_evaluate": false,
2046
+ "should_log": false,
2047
+ "should_save": true,
2048
+ "should_training_stop": false
2049
+ },
2050
+ "attributes": {}
2051
+ }
2052
+ },
2053
+ "total_flos": 1.1426794605084672e+20,
2054
+ "train_batch_size": 4,
2055
+ "trial_name": null,
2056
+ "trial_params": null
2057
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f81aa1a3c2f98ba27f0cfa5494dccf005388f07cd625fe08778f4e1a1b4929c6
3
+ size 5329