YongkangZOU commited on
Commit
fc6036c
·
0 Parent(s):

Duplicate from YongkangZOU/evoxtral-lora

Browse files
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tekken.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: mistralai/Voxtral-Mini-3B-2507
4
+ tags:
5
+ - voxtral
6
+ - lora
7
+ - speech-recognition
8
+ - expressive-transcription
9
+ - audio
10
+ - mistral
11
+ - hackathon
12
+ datasets:
13
+ - custom
14
+ language:
15
+ - en
16
+ license: apache-2.0
17
+ pipeline_tag: automatic-speech-recognition
18
+ ---
19
+
20
+ # Evoxtral LoRA — Expressive Tagged Transcription
21
+
22
+ A LoRA adapter for [Voxtral-Mini-3B-2507](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507) that produces transcriptions enriched with inline expressive audio tags from the [ElevenLabs v3 tag set](https://elevenlabs.io/docs/api-reference/text-to-speech).
23
+
24
+ Built for the **Mistral AI Online Hackathon 2026** (W&B Fine-Tuning Track).
25
+
26
+ ## What It Does
27
+
28
+ Standard ASR:
29
+ > So I was thinking maybe we could try that new restaurant downtown. I mean if you're free this weekend.
30
+
31
+ Evoxtral:
32
+ > [nervous] So... [stammers] I was thinking maybe we could... [clears throat] try that new restaurant downtown? [laughs nervously] I mean, if you're free this weekend?
33
+
34
+ ## Evaluation Results
35
+
36
+ | Metric | Base Voxtral | Evoxtral (finetuned) | Improvement |
37
+ |--------|-------------|---------------------|-------------|
38
+ | **WER** (Word Error Rate) | 6.64% | **4.47%** | 32.7% better |
39
+ | **Tag F1** (Expressive Tag Accuracy) | 22.0% | **67.2%** | 3x better |
40
+
41
+ Evaluated on 50 held-out test samples. The finetuned model dramatically improves expressive tag generation while also improving raw transcription accuracy.
42
+
43
+ ## Training Details
44
+
45
+ | Parameter | Value |
46
+ |-----------|-------|
47
+ | Base model | `mistralai/Voxtral-Mini-3B-2507` |
48
+ | Method | LoRA (PEFT) |
49
+ | LoRA rank | 64 |
50
+ | LoRA alpha | 128 |
51
+ | LoRA dropout | 0.05 |
52
+ | Target modules | q/k/v/o_proj, gate/up/down_proj, multi_modal_projector |
53
+ | Learning rate | 2e-4 |
54
+ | Scheduler | Cosine |
55
+ | Epochs | 3 |
56
+ | Batch size | 2 (effective 16 with grad accum 8) |
57
+ | NEFTune noise alpha | 5.0 |
58
+ | Precision | bf16 |
59
+ | GPU | NVIDIA A10G (24GB) |
60
+ | Training time | ~25 minutes |
61
+ | Trainable params | 124.8M / 4.8B (2.6%) |
62
+
63
+ ## Dataset
64
+
65
+ Custom synthetic dataset of 1,010 audio samples generated with ElevenLabs TTS v3:
66
+ - **808** train / **101** validation / **101** test
67
+ - Each sample has audio + tagged transcription with inline ElevenLabs v3 expressive tags
68
+ - Tags include: `[sighs]`, `[laughs]`, `[whispers]`, `[nervous]`, `[frustrated]`, `[clears throat]`, `[pause]`, `[excited]`, and more
69
+ - Audio encoder (Whisper-based) was frozen during training
70
+
71
+ ## Usage
72
+
73
+ ```python
74
+ import torch
75
+ from transformers import VoxtralForConditionalGeneration, AutoProcessor
76
+ from peft import PeftModel
77
+
78
+ repo_id = "mistralai/Voxtral-Mini-3B-2507"
79
+ adapter_id = "YongkangZOU/evoxtral-lora"
80
+
81
+ processor = AutoProcessor.from_pretrained(repo_id)
82
+ base_model = VoxtralForConditionalGeneration.from_pretrained(
83
+ repo_id, dtype=torch.bfloat16, device_map="auto"
84
+ )
85
+ model = PeftModel.from_pretrained(base_model, adapter_id)
86
+
87
+ # Transcribe audio with expressive tags
88
+ inputs = processor.apply_transcription_request(
89
+ language="en",
90
+ audio=["path/to/audio.wav"],
91
+ format=["WAV"],
92
+ model_id=repo_id,
93
+ return_tensors="pt",
94
+ )
95
+ inputs = inputs.to(model.device, dtype=torch.bfloat16)
96
+
97
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
98
+ transcription = processor.batch_decode(
99
+ outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True
100
+ )[0]
101
+ print(transcription)
102
+ # [nervous] So... I was thinking maybe we could [clears throat] try that new restaurant downtown?
103
+ ```
104
+
105
+ ## W&B Tracking
106
+
107
+ All training and evaluation runs are tracked on Weights & Biases:
108
+ - [Training run](https://wandb.ai/yongkang-zou-ai/evoxtral/runs/t8ak7a20)
109
+ - [Base model eval](https://wandb.ai/yongkang-zou-ai/evoxtral/runs/f9l2zwvs)
110
+ - [Finetuned model eval](https://wandb.ai/yongkang-zou-ai/evoxtral/runs/b32c74im)
111
+ - [Project dashboard](https://wandb.ai/yongkang-zou-ai/evoxtral)
112
+
113
+ ## Supported Tags
114
+
115
+ The model can produce any tag from the ElevenLabs v3 expressive tag set, including:
116
+
117
+ `[laughs]` `[sighs]` `[gasps]` `[clears throat]` `[whispers]` `[sniffs]` `[pause]` `[nervous]` `[frustrated]` `[excited]` `[sad]` `[angry]` `[calm]` `[stammers]` `[yawns]` and more.
118
+
119
+ ## Limitations
120
+
121
+ - Trained on synthetic (TTS-generated) audio, not natural speech recordings
122
+ - Tag F1 of 67.2% means ~1/3 of tags may be missed or misplaced
123
+ - English only
124
+ - Best results on conversational and emotionally expressive speech
125
+
126
+ ## Citation
127
+
128
+ ```bibtex
129
+ @misc{evoxtral2026,
130
+ title={Evoxtral: Expressive Tagged Transcription with Voxtral},
131
+ author={Yongkang Zou},
132
+ year={2026},
133
+ url={https://huggingface.co/YongkangZOU/evoxtral-lora}
134
+ }
135
+ ```
adapter_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "mistralai/Voxtral-Mini-3B-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 128,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 64,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "v_proj",
34
+ "down_proj",
35
+ "o_proj",
36
+ "gate_proj",
37
+ "k_proj",
38
+ "multi_modal_projector.linear_2",
39
+ "multi_modal_projector.linear_1",
40
+ "up_proj"
41
+ ],
42
+ "target_parameters": null,
43
+ "task_type": "CAUSAL_LM",
44
+ "trainable_token_indices": null,
45
+ "use_dora": false,
46
+ "use_qalora": false,
47
+ "use_rslora": false
48
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:909e030619cbaf52ad33e05db26cb9c97df78cb0fdfda94637a1445ba7e49df8
3
+ size 499212672
checkpoint-100/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Voxtral-Mini-3B-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Voxtral-Mini-3B-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
checkpoint-100/adapter_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "mistralai/Voxtral-Mini-3B-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 128,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 64,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "v_proj",
34
+ "down_proj",
35
+ "o_proj",
36
+ "gate_proj",
37
+ "k_proj",
38
+ "multi_modal_projector.linear_2",
39
+ "multi_modal_projector.linear_1",
40
+ "up_proj"
41
+ ],
42
+ "target_parameters": null,
43
+ "task_type": "CAUSAL_LM",
44
+ "trainable_token_indices": null,
45
+ "use_dora": false,
46
+ "use_qalora": false,
47
+ "use_rslora": false
48
+ }
checkpoint-100/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7fc014b17b21116e68d7fd7a381fcac925f79117813361778c41606a6e7f2b6f
3
+ size 499212672
checkpoint-100/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:537338d486df707334d1c928b076d95c11cf5978af93a6649a1bbee9ec7c0478
3
+ size 998774363
checkpoint-100/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2a70d660bebd66c60c022cbf431d1fd0561c8127e390216fdabeb26c1afe8e6
3
+ size 14645
checkpoint-100/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd446f12be119caf8b207903edb080afe016482b1d3e825a8f6c7eed7b5aad20
3
+ size 1465
checkpoint-100/trainer_state.json ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 100,
3
+ "best_metric": 0.1666431427001953,
4
+ "best_model_checkpoint": "/output/evoxtral-lora/checkpoint-100",
5
+ "epoch": 1.9702970297029703,
6
+ "eval_steps": 50,
7
+ "global_step": 100,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.09900990099009901,
14
+ "grad_norm": 5.802285194396973,
15
+ "learning_rate": 1.6000000000000003e-05,
16
+ "loss": 0.9557,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 0.19801980198019803,
21
+ "grad_norm": 2.306023597717285,
22
+ "learning_rate": 3.6e-05,
23
+ "loss": 0.461,
24
+ "step": 10
25
+ },
26
+ {
27
+ "epoch": 0.297029702970297,
28
+ "grad_norm": 2.0266661643981934,
29
+ "learning_rate": 5.6000000000000006e-05,
30
+ "loss": 0.3281,
31
+ "step": 15
32
+ },
33
+ {
34
+ "epoch": 0.39603960396039606,
35
+ "grad_norm": 1.5811512470245361,
36
+ "learning_rate": 7.6e-05,
37
+ "loss": 0.2656,
38
+ "step": 20
39
+ },
40
+ {
41
+ "epoch": 0.49504950495049505,
42
+ "grad_norm": 1.5266460180282593,
43
+ "learning_rate": 9.6e-05,
44
+ "loss": 0.2477,
45
+ "step": 25
46
+ },
47
+ {
48
+ "epoch": 0.594059405940594,
49
+ "grad_norm": 1.0605043172836304,
50
+ "learning_rate": 0.000116,
51
+ "loss": 0.2199,
52
+ "step": 30
53
+ },
54
+ {
55
+ "epoch": 0.693069306930693,
56
+ "grad_norm": 1.4974918365478516,
57
+ "learning_rate": 0.00013600000000000003,
58
+ "loss": 0.1935,
59
+ "step": 35
60
+ },
61
+ {
62
+ "epoch": 0.7920792079207921,
63
+ "grad_norm": 1.1257297992706299,
64
+ "learning_rate": 0.00015600000000000002,
65
+ "loss": 0.1739,
66
+ "step": 40
67
+ },
68
+ {
69
+ "epoch": 0.8910891089108911,
70
+ "grad_norm": 0.9485541582107544,
71
+ "learning_rate": 0.00017600000000000002,
72
+ "loss": 0.1829,
73
+ "step": 45
74
+ },
75
+ {
76
+ "epoch": 0.9900990099009901,
77
+ "grad_norm": 0.6643406748771667,
78
+ "learning_rate": 0.000196,
79
+ "loss": 0.1887,
80
+ "step": 50
81
+ },
82
+ {
83
+ "epoch": 0.9900990099009901,
84
+ "eval_loss": 0.1841340959072113,
85
+ "eval_runtime": 17.7494,
86
+ "eval_samples_per_second": 5.69,
87
+ "eval_steps_per_second": 2.873,
88
+ "step": 50
89
+ },
90
+ {
91
+ "epoch": 1.0792079207920793,
92
+ "grad_norm": 0.7675807476043701,
93
+ "learning_rate": 0.0001992566788083908,
94
+ "loss": 0.1535,
95
+ "step": 55
96
+ },
97
+ {
98
+ "epoch": 1.1782178217821782,
99
+ "grad_norm": 0.6500945687294006,
100
+ "learning_rate": 0.0001962558656223516,
101
+ "loss": 0.15,
102
+ "step": 60
103
+ },
104
+ {
105
+ "epoch": 1.2772277227722773,
106
+ "grad_norm": 3.4771132469177246,
107
+ "learning_rate": 0.00019102070542141328,
108
+ "loss": 0.1058,
109
+ "step": 65
110
+ },
111
+ {
112
+ "epoch": 1.3762376237623761,
113
+ "grad_norm": 6.509069919586182,
114
+ "learning_rate": 0.0001836727197823842,
115
+ "loss": 0.148,
116
+ "step": 70
117
+ },
118
+ {
119
+ "epoch": 1.4752475247524752,
120
+ "grad_norm": 1.21564519405365,
121
+ "learning_rate": 0.0001743824744123196,
122
+ "loss": 0.154,
123
+ "step": 75
124
+ },
125
+ {
126
+ "epoch": 1.5742574257425743,
127
+ "grad_norm": 0.9891815185546875,
128
+ "learning_rate": 0.00016336561987834153,
129
+ "loss": 0.1472,
130
+ "step": 80
131
+ },
132
+ {
133
+ "epoch": 1.6732673267326734,
134
+ "grad_norm": 0.8960129022598267,
135
+ "learning_rate": 0.00015087788580152206,
136
+ "loss": 0.1388,
137
+ "step": 85
138
+ },
139
+ {
140
+ "epoch": 1.7722772277227723,
141
+ "grad_norm": 0.6417158842086792,
142
+ "learning_rate": 0.00013720914471250644,
143
+ "loss": 0.1544,
144
+ "step": 90
145
+ },
146
+ {
147
+ "epoch": 1.8712871287128712,
148
+ "grad_norm": 0.6971271634101868,
149
+ "learning_rate": 0.00012267668336210413,
150
+ "loss": 0.1216,
151
+ "step": 95
152
+ },
153
+ {
154
+ "epoch": 1.9702970297029703,
155
+ "grad_norm": 0.6400907039642334,
156
+ "learning_rate": 0.00010761783767709182,
157
+ "loss": 0.1347,
158
+ "step": 100
159
+ },
160
+ {
161
+ "epoch": 1.9702970297029703,
162
+ "eval_loss": 0.1666431427001953,
163
+ "eval_runtime": 17.7221,
164
+ "eval_samples_per_second": 5.699,
165
+ "eval_steps_per_second": 2.878,
166
+ "step": 100
167
+ }
168
+ ],
169
+ "logging_steps": 5,
170
+ "max_steps": 153,
171
+ "num_input_tokens_seen": 0,
172
+ "num_train_epochs": 3,
173
+ "save_steps": 50,
174
+ "stateful_callbacks": {
175
+ "TrainerControl": {
176
+ "args": {
177
+ "should_epoch_stop": false,
178
+ "should_evaluate": false,
179
+ "should_log": false,
180
+ "should_save": true,
181
+ "should_training_stop": false
182
+ },
183
+ "attributes": {}
184
+ }
185
+ },
186
+ "total_flos": 1.8528360565456896e+16,
187
+ "train_batch_size": 2,
188
+ "trial_name": null,
189
+ "trial_params": null
190
+ }
checkpoint-100/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cd58f49b2411a3bf3ec16af828d493672732a1ebe34589d3922b365ec57325e
3
+ size 5777
checkpoint-150/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Voxtral-Mini-3B-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Voxtral-Mini-3B-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
checkpoint-150/adapter_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "mistralai/Voxtral-Mini-3B-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 128,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 64,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "v_proj",
34
+ "down_proj",
35
+ "o_proj",
36
+ "gate_proj",
37
+ "k_proj",
38
+ "multi_modal_projector.linear_2",
39
+ "multi_modal_projector.linear_1",
40
+ "up_proj"
41
+ ],
42
+ "target_parameters": null,
43
+ "task_type": "CAUSAL_LM",
44
+ "trainable_token_indices": null,
45
+ "use_dora": false,
46
+ "use_qalora": false,
47
+ "use_rslora": false
48
+ }
checkpoint-150/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:909e030619cbaf52ad33e05db26cb9c97df78cb0fdfda94637a1445ba7e49df8
3
+ size 499212672
checkpoint-150/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f6e2cdcbc3a06663bb885d19ddec826a2746e78cdbe7f851777e8edf2c516ca
3
+ size 998774363
checkpoint-150/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb6f8e4476c92bbe0a1f1dc469afe36379c0cc6e449e17836caca06905a8dc02
3
+ size 14645
checkpoint-150/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62a5aefeb0dad12da4b38995b66f32da699ab73f2ff6844ca83eaa0d41c6ed50
3
+ size 1465
checkpoint-150/trainer_state.json ADDED
@@ -0,0 +1,268 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 150,
3
+ "best_metric": 0.1592499166727066,
4
+ "best_model_checkpoint": "/output/evoxtral-lora/checkpoint-150",
5
+ "epoch": 2.9504950495049505,
6
+ "eval_steps": 50,
7
+ "global_step": 150,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.09900990099009901,
14
+ "grad_norm": 5.802285194396973,
15
+ "learning_rate": 1.6000000000000003e-05,
16
+ "loss": 0.9557,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 0.19801980198019803,
21
+ "grad_norm": 2.306023597717285,
22
+ "learning_rate": 3.6e-05,
23
+ "loss": 0.461,
24
+ "step": 10
25
+ },
26
+ {
27
+ "epoch": 0.297029702970297,
28
+ "grad_norm": 2.0266661643981934,
29
+ "learning_rate": 5.6000000000000006e-05,
30
+ "loss": 0.3281,
31
+ "step": 15
32
+ },
33
+ {
34
+ "epoch": 0.39603960396039606,
35
+ "grad_norm": 1.5811512470245361,
36
+ "learning_rate": 7.6e-05,
37
+ "loss": 0.2656,
38
+ "step": 20
39
+ },
40
+ {
41
+ "epoch": 0.49504950495049505,
42
+ "grad_norm": 1.5266460180282593,
43
+ "learning_rate": 9.6e-05,
44
+ "loss": 0.2477,
45
+ "step": 25
46
+ },
47
+ {
48
+ "epoch": 0.594059405940594,
49
+ "grad_norm": 1.0605043172836304,
50
+ "learning_rate": 0.000116,
51
+ "loss": 0.2199,
52
+ "step": 30
53
+ },
54
+ {
55
+ "epoch": 0.693069306930693,
56
+ "grad_norm": 1.4974918365478516,
57
+ "learning_rate": 0.00013600000000000003,
58
+ "loss": 0.1935,
59
+ "step": 35
60
+ },
61
+ {
62
+ "epoch": 0.7920792079207921,
63
+ "grad_norm": 1.1257297992706299,
64
+ "learning_rate": 0.00015600000000000002,
65
+ "loss": 0.1739,
66
+ "step": 40
67
+ },
68
+ {
69
+ "epoch": 0.8910891089108911,
70
+ "grad_norm": 0.9485541582107544,
71
+ "learning_rate": 0.00017600000000000002,
72
+ "loss": 0.1829,
73
+ "step": 45
74
+ },
75
+ {
76
+ "epoch": 0.9900990099009901,
77
+ "grad_norm": 0.6643406748771667,
78
+ "learning_rate": 0.000196,
79
+ "loss": 0.1887,
80
+ "step": 50
81
+ },
82
+ {
83
+ "epoch": 0.9900990099009901,
84
+ "eval_loss": 0.1841340959072113,
85
+ "eval_runtime": 17.7494,
86
+ "eval_samples_per_second": 5.69,
87
+ "eval_steps_per_second": 2.873,
88
+ "step": 50
89
+ },
90
+ {
91
+ "epoch": 1.0792079207920793,
92
+ "grad_norm": 0.7675807476043701,
93
+ "learning_rate": 0.0001992566788083908,
94
+ "loss": 0.1535,
95
+ "step": 55
96
+ },
97
+ {
98
+ "epoch": 1.1782178217821782,
99
+ "grad_norm": 0.6500945687294006,
100
+ "learning_rate": 0.0001962558656223516,
101
+ "loss": 0.15,
102
+ "step": 60
103
+ },
104
+ {
105
+ "epoch": 1.2772277227722773,
106
+ "grad_norm": 3.4771132469177246,
107
+ "learning_rate": 0.00019102070542141328,
108
+ "loss": 0.1058,
109
+ "step": 65
110
+ },
111
+ {
112
+ "epoch": 1.3762376237623761,
113
+ "grad_norm": 6.509069919586182,
114
+ "learning_rate": 0.0001836727197823842,
115
+ "loss": 0.148,
116
+ "step": 70
117
+ },
118
+ {
119
+ "epoch": 1.4752475247524752,
120
+ "grad_norm": 1.21564519405365,
121
+ "learning_rate": 0.0001743824744123196,
122
+ "loss": 0.154,
123
+ "step": 75
124
+ },
125
+ {
126
+ "epoch": 1.5742574257425743,
127
+ "grad_norm": 0.9891815185546875,
128
+ "learning_rate": 0.00016336561987834153,
129
+ "loss": 0.1472,
130
+ "step": 80
131
+ },
132
+ {
133
+ "epoch": 1.6732673267326734,
134
+ "grad_norm": 0.8960129022598267,
135
+ "learning_rate": 0.00015087788580152206,
136
+ "loss": 0.1388,
137
+ "step": 85
138
+ },
139
+ {
140
+ "epoch": 1.7722772277227723,
141
+ "grad_norm": 0.6417158842086792,
142
+ "learning_rate": 0.00013720914471250644,
143
+ "loss": 0.1544,
144
+ "step": 90
145
+ },
146
+ {
147
+ "epoch": 1.8712871287128712,
148
+ "grad_norm": 0.6971271634101868,
149
+ "learning_rate": 0.00012267668336210413,
150
+ "loss": 0.1216,
151
+ "step": 95
152
+ },
153
+ {
154
+ "epoch": 1.9702970297029703,
155
+ "grad_norm": 0.6400907039642334,
156
+ "learning_rate": 0.00010761783767709182,
157
+ "loss": 0.1347,
158
+ "step": 100
159
+ },
160
+ {
161
+ "epoch": 1.9702970297029703,
162
+ "eval_loss": 0.1666431427001953,
163
+ "eval_runtime": 17.7221,
164
+ "eval_samples_per_second": 5.699,
165
+ "eval_steps_per_second": 2.878,
166
+ "step": 100
167
+ },
168
+ {
169
+ "epoch": 2.0594059405940595,
170
+ "grad_norm": 0.40439656376838684,
171
+ "learning_rate": 9.238216232290822e-05,
172
+ "loss": 0.0851,
173
+ "step": 105
174
+ },
175
+ {
176
+ "epoch": 2.1584158415841586,
177
+ "grad_norm": 0.47545361518859863,
178
+ "learning_rate": 7.732331663789592e-05,
179
+ "loss": 0.0614,
180
+ "step": 110
181
+ },
182
+ {
183
+ "epoch": 2.2574257425742577,
184
+ "grad_norm": 0.42001137137413025,
185
+ "learning_rate": 6.279085528749359e-05,
186
+ "loss": 0.0607,
187
+ "step": 115
188
+ },
189
+ {
190
+ "epoch": 2.3564356435643563,
191
+ "grad_norm": 0.7629411220550537,
192
+ "learning_rate": 4.912211419847794e-05,
193
+ "loss": 0.0556,
194
+ "step": 120
195
+ },
196
+ {
197
+ "epoch": 2.4554455445544554,
198
+ "grad_norm": 0.4428424537181854,
199
+ "learning_rate": 3.663438012165848e-05,
200
+ "loss": 0.0449,
201
+ "step": 125
202
+ },
203
+ {
204
+ "epoch": 2.5544554455445545,
205
+ "grad_norm": 0.564255952835083,
206
+ "learning_rate": 2.5617525587680402e-05,
207
+ "loss": 0.0484,
208
+ "step": 130
209
+ },
210
+ {
211
+ "epoch": 2.6534653465346536,
212
+ "grad_norm": 0.4994179308414459,
213
+ "learning_rate": 1.6327280217615792e-05,
214
+ "loss": 0.0447,
215
+ "step": 135
216
+ },
217
+ {
218
+ "epoch": 2.7524752475247523,
219
+ "grad_norm": 0.4854471981525421,
220
+ "learning_rate": 8.979294578586738e-06,
221
+ "loss": 0.0475,
222
+ "step": 140
223
+ },
224
+ {
225
+ "epoch": 2.8514851485148514,
226
+ "grad_norm": 0.6532300114631653,
227
+ "learning_rate": 3.7441343776484117e-06,
228
+ "loss": 0.0477,
229
+ "step": 145
230
+ },
231
+ {
232
+ "epoch": 2.9504950495049505,
233
+ "grad_norm": 0.39523014426231384,
234
+ "learning_rate": 7.433211916092142e-07,
235
+ "loss": 0.0478,
236
+ "step": 150
237
+ },
238
+ {
239
+ "epoch": 2.9504950495049505,
240
+ "eval_loss": 0.1592499166727066,
241
+ "eval_runtime": 17.8203,
242
+ "eval_samples_per_second": 5.668,
243
+ "eval_steps_per_second": 2.862,
244
+ "step": 150
245
+ }
246
+ ],
247
+ "logging_steps": 5,
248
+ "max_steps": 153,
249
+ "num_input_tokens_seen": 0,
250
+ "num_train_epochs": 3,
251
+ "save_steps": 50,
252
+ "stateful_callbacks": {
253
+ "TrainerControl": {
254
+ "args": {
255
+ "should_epoch_stop": false,
256
+ "should_evaluate": false,
257
+ "should_log": false,
258
+ "should_save": true,
259
+ "should_training_stop": false
260
+ },
261
+ "attributes": {}
262
+ }
263
+ },
264
+ "total_flos": 2.773627471680307e+16,
265
+ "train_batch_size": 2,
266
+ "trial_name": null,
267
+ "trial_params": null
268
+ }
checkpoint-150/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cd58f49b2411a3bf3ec16af828d493672732a1ebe34589d3922b365ec57325e
3
+ size 5777
checkpoint-153/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Voxtral-Mini-3B-2507
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:mistralai/Voxtral-Mini-3B-2507
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
checkpoint-153/adapter_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "mistralai/Voxtral-Mini-3B-2507",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 128,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 64,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "v_proj",
34
+ "down_proj",
35
+ "o_proj",
36
+ "gate_proj",
37
+ "k_proj",
38
+ "multi_modal_projector.linear_2",
39
+ "multi_modal_projector.linear_1",
40
+ "up_proj"
41
+ ],
42
+ "target_parameters": null,
43
+ "task_type": "CAUSAL_LM",
44
+ "trainable_token_indices": null,
45
+ "use_dora": false,
46
+ "use_qalora": false,
47
+ "use_rslora": false
48
+ }
checkpoint-153/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f764480fa4d0b56f064d10cb530f070f40a39ef0d5f7b02c4cd7c0b623a46324
3
+ size 499212672
checkpoint-153/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9dd86a9509b79108a2d96f051b1a1069453f3feee74726a3053f456d26cb92d
3
+ size 998774363
checkpoint-153/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24bda35c561ec5058b4d5495313dfd4d8ddd1b2711a8071d69c8abe265eb1029
3
+ size 14645
checkpoint-153/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70962943a8986eb124d40ffe067f22ad7b7aad7f2e9e9d76f6959f3047a499c3
3
+ size 1465
checkpoint-153/trainer_state.json ADDED
@@ -0,0 +1,268 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 150,
3
+ "best_metric": 0.1592499166727066,
4
+ "best_model_checkpoint": "/output/evoxtral-lora/checkpoint-150",
5
+ "epoch": 3.0,
6
+ "eval_steps": 50,
7
+ "global_step": 153,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.09900990099009901,
14
+ "grad_norm": 5.802285194396973,
15
+ "learning_rate": 1.6000000000000003e-05,
16
+ "loss": 0.9557,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 0.19801980198019803,
21
+ "grad_norm": 2.306023597717285,
22
+ "learning_rate": 3.6e-05,
23
+ "loss": 0.461,
24
+ "step": 10
25
+ },
26
+ {
27
+ "epoch": 0.297029702970297,
28
+ "grad_norm": 2.0266661643981934,
29
+ "learning_rate": 5.6000000000000006e-05,
30
+ "loss": 0.3281,
31
+ "step": 15
32
+ },
33
+ {
34
+ "epoch": 0.39603960396039606,
35
+ "grad_norm": 1.5811512470245361,
36
+ "learning_rate": 7.6e-05,
37
+ "loss": 0.2656,
38
+ "step": 20
39
+ },
40
+ {
41
+ "epoch": 0.49504950495049505,
42
+ "grad_norm": 1.5266460180282593,
43
+ "learning_rate": 9.6e-05,
44
+ "loss": 0.2477,
45
+ "step": 25
46
+ },
47
+ {
48
+ "epoch": 0.594059405940594,
49
+ "grad_norm": 1.0605043172836304,
50
+ "learning_rate": 0.000116,
51
+ "loss": 0.2199,
52
+ "step": 30
53
+ },
54
+ {
55
+ "epoch": 0.693069306930693,
56
+ "grad_norm": 1.4974918365478516,
57
+ "learning_rate": 0.00013600000000000003,
58
+ "loss": 0.1935,
59
+ "step": 35
60
+ },
61
+ {
62
+ "epoch": 0.7920792079207921,
63
+ "grad_norm": 1.1257297992706299,
64
+ "learning_rate": 0.00015600000000000002,
65
+ "loss": 0.1739,
66
+ "step": 40
67
+ },
68
+ {
69
+ "epoch": 0.8910891089108911,
70
+ "grad_norm": 0.9485541582107544,
71
+ "learning_rate": 0.00017600000000000002,
72
+ "loss": 0.1829,
73
+ "step": 45
74
+ },
75
+ {
76
+ "epoch": 0.9900990099009901,
77
+ "grad_norm": 0.6643406748771667,
78
+ "learning_rate": 0.000196,
79
+ "loss": 0.1887,
80
+ "step": 50
81
+ },
82
+ {
83
+ "epoch": 0.9900990099009901,
84
+ "eval_loss": 0.1841340959072113,
85
+ "eval_runtime": 17.7494,
86
+ "eval_samples_per_second": 5.69,
87
+ "eval_steps_per_second": 2.873,
88
+ "step": 50
89
+ },
90
+ {
91
+ "epoch": 1.0792079207920793,
92
+ "grad_norm": 0.7675807476043701,
93
+ "learning_rate": 0.0001992566788083908,
94
+ "loss": 0.1535,
95
+ "step": 55
96
+ },
97
+ {
98
+ "epoch": 1.1782178217821782,
99
+ "grad_norm": 0.6500945687294006,
100
+ "learning_rate": 0.0001962558656223516,
101
+ "loss": 0.15,
102
+ "step": 60
103
+ },
104
+ {
105
+ "epoch": 1.2772277227722773,
106
+ "grad_norm": 3.4771132469177246,
107
+ "learning_rate": 0.00019102070542141328,
108
+ "loss": 0.1058,
109
+ "step": 65
110
+ },
111
+ {
112
+ "epoch": 1.3762376237623761,
113
+ "grad_norm": 6.509069919586182,
114
+ "learning_rate": 0.0001836727197823842,
115
+ "loss": 0.148,
116
+ "step": 70
117
+ },
118
+ {
119
+ "epoch": 1.4752475247524752,
120
+ "grad_norm": 1.21564519405365,
121
+ "learning_rate": 0.0001743824744123196,
122
+ "loss": 0.154,
123
+ "step": 75
124
+ },
125
+ {
126
+ "epoch": 1.5742574257425743,
127
+ "grad_norm": 0.9891815185546875,
128
+ "learning_rate": 0.00016336561987834153,
129
+ "loss": 0.1472,
130
+ "step": 80
131
+ },
132
+ {
133
+ "epoch": 1.6732673267326734,
134
+ "grad_norm": 0.8960129022598267,
135
+ "learning_rate": 0.00015087788580152206,
136
+ "loss": 0.1388,
137
+ "step": 85
138
+ },
139
+ {
140
+ "epoch": 1.7722772277227723,
141
+ "grad_norm": 0.6417158842086792,
142
+ "learning_rate": 0.00013720914471250644,
143
+ "loss": 0.1544,
144
+ "step": 90
145
+ },
146
+ {
147
+ "epoch": 1.8712871287128712,
148
+ "grad_norm": 0.6971271634101868,
149
+ "learning_rate": 0.00012267668336210413,
150
+ "loss": 0.1216,
151
+ "step": 95
152
+ },
153
+ {
154
+ "epoch": 1.9702970297029703,
155
+ "grad_norm": 0.6400907039642334,
156
+ "learning_rate": 0.00010761783767709182,
157
+ "loss": 0.1347,
158
+ "step": 100
159
+ },
160
+ {
161
+ "epoch": 1.9702970297029703,
162
+ "eval_loss": 0.1666431427001953,
163
+ "eval_runtime": 17.7221,
164
+ "eval_samples_per_second": 5.699,
165
+ "eval_steps_per_second": 2.878,
166
+ "step": 100
167
+ },
168
+ {
169
+ "epoch": 2.0594059405940595,
170
+ "grad_norm": 0.40439656376838684,
171
+ "learning_rate": 9.238216232290822e-05,
172
+ "loss": 0.0851,
173
+ "step": 105
174
+ },
175
+ {
176
+ "epoch": 2.1584158415841586,
177
+ "grad_norm": 0.47545361518859863,
178
+ "learning_rate": 7.732331663789592e-05,
179
+ "loss": 0.0614,
180
+ "step": 110
181
+ },
182
+ {
183
+ "epoch": 2.2574257425742577,
184
+ "grad_norm": 0.42001137137413025,
185
+ "learning_rate": 6.279085528749359e-05,
186
+ "loss": 0.0607,
187
+ "step": 115
188
+ },
189
+ {
190
+ "epoch": 2.3564356435643563,
191
+ "grad_norm": 0.7629411220550537,
192
+ "learning_rate": 4.912211419847794e-05,
193
+ "loss": 0.0556,
194
+ "step": 120
195
+ },
196
+ {
197
+ "epoch": 2.4554455445544554,
198
+ "grad_norm": 0.4428424537181854,
199
+ "learning_rate": 3.663438012165848e-05,
200
+ "loss": 0.0449,
201
+ "step": 125
202
+ },
203
+ {
204
+ "epoch": 2.5544554455445545,
205
+ "grad_norm": 0.564255952835083,
206
+ "learning_rate": 2.5617525587680402e-05,
207
+ "loss": 0.0484,
208
+ "step": 130
209
+ },
210
+ {
211
+ "epoch": 2.6534653465346536,
212
+ "grad_norm": 0.4994179308414459,
213
+ "learning_rate": 1.6327280217615792e-05,
214
+ "loss": 0.0447,
215
+ "step": 135
216
+ },
217
+ {
218
+ "epoch": 2.7524752475247523,
219
+ "grad_norm": 0.4854471981525421,
220
+ "learning_rate": 8.979294578586738e-06,
221
+ "loss": 0.0475,
222
+ "step": 140
223
+ },
224
+ {
225
+ "epoch": 2.8514851485148514,
226
+ "grad_norm": 0.6532300114631653,
227
+ "learning_rate": 3.7441343776484117e-06,
228
+ "loss": 0.0477,
229
+ "step": 145
230
+ },
231
+ {
232
+ "epoch": 2.9504950495049505,
233
+ "grad_norm": 0.39523014426231384,
234
+ "learning_rate": 7.433211916092142e-07,
235
+ "loss": 0.0478,
236
+ "step": 150
237
+ },
238
+ {
239
+ "epoch": 2.9504950495049505,
240
+ "eval_loss": 0.1592499166727066,
241
+ "eval_runtime": 17.8203,
242
+ "eval_samples_per_second": 5.668,
243
+ "eval_steps_per_second": 2.862,
244
+ "step": 150
245
+ }
246
+ ],
247
+ "logging_steps": 5,
248
+ "max_steps": 153,
249
+ "num_input_tokens_seen": 0,
250
+ "num_train_epochs": 3,
251
+ "save_steps": 50,
252
+ "stateful_callbacks": {
253
+ "TrainerControl": {
254
+ "args": {
255
+ "should_epoch_stop": false,
256
+ "should_evaluate": false,
257
+ "should_log": false,
258
+ "should_save": true,
259
+ "should_training_stop": true
260
+ },
261
+ "attributes": {}
262
+ }
263
+ },
264
+ "total_flos": 2.8196216707792896e+16,
265
+ "train_batch_size": 2,
266
+ "trial_name": null,
267
+ "trial_params": null
268
+ }
checkpoint-153/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cd58f49b2411a3bf3ec16af828d493672732a1ebe34589d3922b365ec57325e
3
+ size 5777
preprocessor_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "dither": 0.0,
4
+ "feature_extractor_type": "WhisperFeatureExtractor",
5
+ "feature_size": 128,
6
+ "hop_length": 160,
7
+ "n_fft": 400,
8
+ "n_samples": 480000,
9
+ "nb_max_frames": 3000,
10
+ "padding_side": "right",
11
+ "padding_value": 0.0,
12
+ "processor_class": "VoxtralProcessor",
13
+ "return_attention_mask": false,
14
+ "sampling_rate": 16000
15
+ }
tekken.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4aaf3836c2a5332f029ce85a7a62255c966f47b6797ef81dedd0ade9c862e4a8
3
+ size 14894206
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cd58f49b2411a3bf3ec16af828d493672732a1ebe34589d3922b365ec57325e
3
+ size 5777