felcas93 commited on
Commit
5ca2642
·
1 Parent(s): 4aa1a6c

LoRA actualizado: entrenamiento con 1 época

Browse files
.gitattributes CHANGED
@@ -1,3 +1,4 @@
 
1
  *.7z filter=lfs diff=lfs merge=lfs -text
2
  *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
@@ -33,3 +34,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
1
+ <<<<<<< HEAD
2
  *.7z filter=lfs diff=lfs merge=lfs -text
3
  *.arrow filter=lfs diff=lfs merge=lfs -text
4
  *.bin filter=lfs diff=lfs merge=lfs -text
 
34
  *.zip filter=lfs diff=lfs merge=lfs -text
35
  *.zst filter=lfs diff=lfs merge=lfs -text
36
  *tfevents* filter=lfs diff=lfs merge=lfs -text
37
+ =======
38
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
39
+ >>>>>>> 76f198e (Add LoRA adapter (StarCoder 1B) + tokenizer metadata)
.gitignore ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ checkpoint-*/
2
+ *.pt
3
+ *.bin
4
+ runs/
5
+ logs/
6
+
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: bigcode/starcoderbase-1b
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:bigcode/starcoderbase-1b
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "bigcode/starcoderbase-1b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "o_proj",
29
+ "w2",
30
+ "q_attn",
31
+ "c_proj",
32
+ "fc_out",
33
+ "c_attn",
34
+ "v_proj",
35
+ "w1",
36
+ "k_proj",
37
+ "fc_in",
38
+ "w3"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07acf660be1fb0743fd84d3362626d55d3bd7170331f6d8976c504bcb81e88f9
3
+ size 28723448
checkpoint-1/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: bigcode/starcoderbase-1b
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:bigcode/starcoderbase-1b
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
checkpoint-1/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "bigcode/starcoderbase-1b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "o_proj",
29
+ "w2",
30
+ "q_attn",
31
+ "c_proj",
32
+ "fc_out",
33
+ "c_attn",
34
+ "v_proj",
35
+ "w1",
36
+ "k_proj",
37
+ "fc_in",
38
+ "w3"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
checkpoint-1/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07acf660be1fb0743fd84d3362626d55d3bd7170331f6d8976c504bcb81e88f9
3
+ size 28723448
checkpoint-1/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc2643cd6beb434ac63c2bd6ab465a529703c5e9fa69f4e7563a3670e5c9b596
3
+ size 57495546
checkpoint-1/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41523a02aa5a848025e81d91cc04e6400cb32a2ae2b439b8f2cca6bd660dc93b
3
+ size 14244
checkpoint-1/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31017ff690e8312759fe6512fd241ba617cb3d42fe6c14cbdd426bcb6b454f8b
3
+ size 1064
checkpoint-1/special_tokens_map.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<fim_prefix>",
5
+ "<fim_middle>",
6
+ "<fim_suffix>",
7
+ "<fim_pad>",
8
+ "<filename>",
9
+ "<gh_stars>",
10
+ "<issue_start>",
11
+ "<issue_comment>",
12
+ "<issue_closed>",
13
+ "<jupyter_start>",
14
+ "<jupyter_text>",
15
+ "<jupyter_code>",
16
+ "<jupyter_output>",
17
+ "<empty_output>",
18
+ "<commit_before>",
19
+ "<commit_msg>",
20
+ "<commit_after>",
21
+ "<reponame>"
22
+ ],
23
+ "bos_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "eos_token": {
31
+ "content": "<|endoftext|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "pad_token": "<|endoftext|>",
38
+ "unk_token": {
39
+ "content": "<|endoftext|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ }
45
+ }
checkpoint-1/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1/tokenizer_config.json ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<fim_prefix>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<fim_middle>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<fim_suffix>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<fim_pad>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<filename>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<gh_stars>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<issue_start>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_comment>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_closed>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<jupyter_start>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_text>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_code>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_output>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<empty_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<commit_before>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<commit_msg>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "17": {
141
+ "content": "<commit_after>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "18": {
149
+ "content": "<reponame>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ }
156
+ },
157
+ "additional_special_tokens": [
158
+ "<|endoftext|>",
159
+ "<fim_prefix>",
160
+ "<fim_middle>",
161
+ "<fim_suffix>",
162
+ "<fim_pad>",
163
+ "<filename>",
164
+ "<gh_stars>",
165
+ "<issue_start>",
166
+ "<issue_comment>",
167
+ "<issue_closed>",
168
+ "<jupyter_start>",
169
+ "<jupyter_text>",
170
+ "<jupyter_code>",
171
+ "<jupyter_output>",
172
+ "<empty_output>",
173
+ "<commit_before>",
174
+ "<commit_msg>",
175
+ "<commit_after>",
176
+ "<reponame>"
177
+ ],
178
+ "bos_token": "<|endoftext|>",
179
+ "clean_up_tokenization_spaces": false,
180
+ "eos_token": "<|endoftext|>",
181
+ "extra_special_tokens": {},
182
+ "model_max_length": 1024,
183
+ "pad_token": "<|endoftext|>",
184
+ "tokenizer_class": "GPT2Tokenizer",
185
+ "unk_token": "<|endoftext|>",
186
+ "vocab_size": 49152
187
+ }
checkpoint-1/trainer_state.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 0.16,
6
+ "eval_steps": 500,
7
+ "global_step": 1,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [],
12
+ "logging_steps": 50,
13
+ "max_steps": 1,
14
+ "num_input_tokens_seen": 0,
15
+ "num_train_epochs": 1,
16
+ "save_steps": 200,
17
+ "stateful_callbacks": {
18
+ "TrainerControl": {
19
+ "args": {
20
+ "should_epoch_stop": false,
21
+ "should_evaluate": false,
22
+ "should_log": false,
23
+ "should_save": true,
24
+ "should_training_stop": true
25
+ },
26
+ "attributes": {}
27
+ }
28
+ },
29
+ "total_flos": 201905204625408.0,
30
+ "train_batch_size": 1,
31
+ "trial_name": null,
32
+ "trial_params": null
33
+ }
checkpoint-1/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d006cbe0053f1bdb26902e5133c9d463f9a400f19d1fce74eea896e0a58c543
3
+ size 5432
checkpoint-1/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<fim_prefix>",
5
+ "<fim_middle>",
6
+ "<fim_suffix>",
7
+ "<fim_pad>",
8
+ "<filename>",
9
+ "<gh_stars>",
10
+ "<issue_start>",
11
+ "<issue_comment>",
12
+ "<issue_closed>",
13
+ "<jupyter_start>",
14
+ "<jupyter_text>",
15
+ "<jupyter_code>",
16
+ "<jupyter_output>",
17
+ "<empty_output>",
18
+ "<commit_before>",
19
+ "<commit_msg>",
20
+ "<commit_after>",
21
+ "<reponame>"
22
+ ],
23
+ "bos_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "eos_token": {
31
+ "content": "<|endoftext|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "pad_token": "<|endoftext|>",
38
+ "unk_token": {
39
+ "content": "<|endoftext|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ }
45
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<fim_prefix>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<fim_middle>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<fim_suffix>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<fim_pad>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<filename>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<gh_stars>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<issue_start>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_comment>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_closed>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<jupyter_start>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_text>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_code>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_output>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<empty_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<commit_before>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<commit_msg>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "17": {
141
+ "content": "<commit_after>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "18": {
149
+ "content": "<reponame>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ }
156
+ },
157
+ "additional_special_tokens": [
158
+ "<|endoftext|>",
159
+ "<fim_prefix>",
160
+ "<fim_middle>",
161
+ "<fim_suffix>",
162
+ "<fim_pad>",
163
+ "<filename>",
164
+ "<gh_stars>",
165
+ "<issue_start>",
166
+ "<issue_comment>",
167
+ "<issue_closed>",
168
+ "<jupyter_start>",
169
+ "<jupyter_text>",
170
+ "<jupyter_code>",
171
+ "<jupyter_output>",
172
+ "<empty_output>",
173
+ "<commit_before>",
174
+ "<commit_msg>",
175
+ "<commit_after>",
176
+ "<reponame>"
177
+ ],
178
+ "bos_token": "<|endoftext|>",
179
+ "clean_up_tokenization_spaces": false,
180
+ "eos_token": "<|endoftext|>",
181
+ "extra_special_tokens": {},
182
+ "model_max_length": 1024,
183
+ "pad_token": "<|endoftext|>",
184
+ "tokenizer_class": "GPT2Tokenizer",
185
+ "unk_token": "<|endoftext|>",
186
+ "vocab_size": 49152
187
+ }
train_e1.log ADDED
@@ -0,0 +1,412 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ⚙️ Config -> BATCH_PER_DEVICE=1, GRAD_ACC=32, MAX_LEN=1024, LR=0.0002, EPOCHS_TOTAL=1.0, CHUNK_SIZE=300, TOK_BATCH=32
2
+ 📦 DATA_PATH=/workspace/data/evaluaciones_pares_input_output.jsonl.gz
3
+ 🔠 Cargando tokenizador...
4
+ 🧠 Cargando modelo base 4-bit...
5
+ ♻️ Retomando desde checkpoint previo...
6
+ trainable params: 7,176,192 || all params: 1,144,383,488 || trainable%: 0.6271
7
+ 🔢 Contando líneas totales (una pasada) ...
8
+ ✅ Total de líneas: 20000
9
+
10
+ ==================== BLOQUE 1 ====================
11
+ 🧩 Ejemplos en bloque: 300
12
+ ⏱️ Épocas en este bloque: 0.0150
13
+ {'train_runtime': 8.7478, 'train_samples_per_second': 0.514, 'train_steps_per_second': 0.114, 'train_loss': 1.3522244691848755, 'epoch': 0.11}
14
+ ✅ Bloque 1 terminado. Acumulado: 300/20000 líneas | global_step=1
15
+
16
+ ==================== BLOQUE 2 ====================
17
+ 🧩 Ejemplos en bloque: 300
18
+ ⏱️ Épocas en este bloque: 0.0150
19
+ {'train_runtime': 7.8455, 'train_samples_per_second': 0.574, 'train_steps_per_second': 0.127, 'train_loss': 1.4102396965026855, 'epoch': 0.11}
20
+ ✅ Bloque 2 terminado. Acumulado: 600/20000 líneas | global_step=1
21
+
22
+ ==================== BLOQUE 3 ====================
23
+ 🧩 Ejemplos en bloque: 300
24
+ ⏱️ Épocas en este bloque: 0.0150
25
+ {'train_runtime': 7.8688, 'train_samples_per_second': 0.572, 'train_steps_per_second': 0.127, 'train_loss': 1.376889705657959, 'epoch': 0.11}
26
+ ✅ Bloque 3 terminado. Acumulado: 900/20000 líneas | global_step=1
27
+
28
+ ==================== BLOQUE 4 ====================
29
+ 🧩 Ejemplos en bloque: 300
30
+ ⏱️ Épocas en este bloque: 0.0150
31
+ {'train_runtime': 7.9219, 'train_samples_per_second': 0.568, 'train_steps_per_second': 0.126, 'train_loss': 1.4048157930374146, 'epoch': 0.11}
32
+ ✅ Bloque 4 terminado. Acumulado: 1200/20000 líneas | global_step=1
33
+
34
+ ==================== BLOQUE 5 ====================
35
+ 🧩 Ejemplos en bloque: 300
36
+ ⏱️ Épocas en este bloque: 0.0150
37
+ {'train_runtime': 7.8425, 'train_samples_per_second': 0.574, 'train_steps_per_second': 0.128, 'train_loss': 1.4059597253799438, 'epoch': 0.11}
38
+ ✅ Bloque 5 terminado. Acumulado: 1500/20000 líneas | global_step=1
39
+
40
+ ==================== BLOQUE 6 ====================
41
+ 🧩 Ejemplos en bloque: 300
42
+ ⏱️ Épocas en este bloque: 0.0150
43
+ {'train_runtime': 7.8155, 'train_samples_per_second': 0.576, 'train_steps_per_second': 0.128, 'train_loss': 1.3940027952194214, 'epoch': 0.11}
44
+ ✅ Bloque 6 terminado. Acumulado: 1800/20000 líneas | global_step=1
45
+
46
+ ==================== BLOQUE 7 ====================
47
+ 🧩 Ejemplos en bloque: 300
48
+ ⏱️ Épocas en este bloque: 0.0150
49
+ {'train_runtime': 7.9907, 'train_samples_per_second': 0.563, 'train_steps_per_second': 0.125, 'train_loss': 1.3537384271621704, 'epoch': 0.11}
50
+ ✅ Bloque 7 terminado. Acumulado: 2100/20000 líneas | global_step=1
51
+
52
+ ==================== BLOQUE 8 ====================
53
+ 🧩 Ejemplos en bloque: 300
54
+ ⏱️ Épocas en este bloque: 0.0150
55
+ {'train_runtime': 8.029, 'train_samples_per_second': 0.56, 'train_steps_per_second': 0.125, 'train_loss': 1.3489651679992676, 'epoch': 0.11}
56
+ ✅ Bloque 8 terminado. Acumulado: 2400/20000 líneas | global_step=1
57
+
58
+ ==================== BLOQUE 9 ====================
59
+ 🧩 Ejemplos en bloque: 300
60
+ ⏱️ Épocas en este bloque: 0.0150
61
+ {'train_runtime': 7.8209, 'train_samples_per_second': 0.575, 'train_steps_per_second': 0.128, 'train_loss': 1.4223603010177612, 'epoch': 0.11}
62
+ ✅ Bloque 9 terminado. Acumulado: 2700/20000 líneas | global_step=1
63
+
64
+ ==================== BLOQUE 10 ====================
65
+ 🧩 Ejemplos en bloque: 300
66
+ ⏱️ Épocas en este bloque: 0.0150
67
+ {'train_runtime': 7.796, 'train_samples_per_second': 0.577, 'train_steps_per_second': 0.128, 'train_loss': 1.4088714122772217, 'epoch': 0.11}
68
+ ✅ Bloque 10 terminado. Acumulado: 3000/20000 líneas | global_step=1
69
+
70
+ ==================== BLOQUE 11 ====================
71
+ 🧩 Ejemplos en bloque: 300
72
+ ⏱️ Épocas en este bloque: 0.0150
73
+ {'train_runtime': 7.9274, 'train_samples_per_second': 0.568, 'train_steps_per_second': 0.126, 'train_loss': 1.4210320711135864, 'epoch': 0.11}
74
+ ✅ Bloque 11 terminado. Acumulado: 3300/20000 líneas | global_step=1
75
+
76
+ ==================== BLOQUE 12 ====================
77
+ 🧩 Ejemplos en bloque: 300
78
+ ⏱️ Épocas en este bloque: 0.0150
79
+ {'train_runtime': 7.8712, 'train_samples_per_second': 0.572, 'train_steps_per_second': 0.127, 'train_loss': 1.4202042818069458, 'epoch': 0.11}
80
+ ✅ Bloque 12 terminado. Acumulado: 3600/20000 líneas | global_step=1
81
+
82
+ ==================== BLOQUE 13 ====================
83
+ 🧩 Ejemplos en bloque: 300
84
+ ⏱️ Épocas en este bloque: 0.0150
85
+ {'train_runtime': 8.4636, 'train_samples_per_second': 0.532, 'train_steps_per_second': 0.118, 'train_loss': 1.45803701877594, 'epoch': 0.11}
86
+ ✅ Bloque 13 terminado. Acumulado: 3900/20000 líneas | global_step=1
87
+
88
+ ==================== BLOQUE 14 ====================
89
+ 🧩 Ejemplos en bloque: 300
90
+ ⏱️ Épocas en este bloque: 0.0150
91
+ {'train_runtime': 7.8809, 'train_samples_per_second': 0.571, 'train_steps_per_second': 0.127, 'train_loss': 1.375520944595337, 'epoch': 0.11}
92
+ ✅ Bloque 14 terminado. Acumulado: 4200/20000 líneas | global_step=1
93
+
94
+ ==================== BLOQUE 15 ====================
95
+ 🧩 Ejemplos en bloque: 300
96
+ ⏱️ Épocas en este bloque: 0.0150
97
+ {'train_runtime': 9.0132, 'train_samples_per_second': 0.499, 'train_steps_per_second': 0.111, 'train_loss': 1.4456514120101929, 'epoch': 0.11}
98
+ ✅ Bloque 15 terminado. Acumulado: 4500/20000 líneas | global_step=1
99
+
100
+ ==================== BLOQUE 16 ====================
101
+ 🧩 Ejemplos en bloque: 300
102
+ ⏱️ Épocas en este bloque: 0.0150
103
+ {'train_runtime': 7.9159, 'train_samples_per_second': 0.568, 'train_steps_per_second': 0.126, 'train_loss': 1.3990974426269531, 'epoch': 0.11}
104
+ ✅ Bloque 16 terminado. Acumulado: 4800/20000 líneas | global_step=1
105
+
106
+ ==================== BLOQUE 17 ====================
107
+ 🧩 Ejemplos en bloque: 300
108
+ ⏱️ Épocas en este bloque: 0.0150
109
+ {'train_runtime': 7.8909, 'train_samples_per_second': 0.57, 'train_steps_per_second': 0.127, 'train_loss': 1.3655331134796143, 'epoch': 0.11}
110
+ ✅ Bloque 17 terminado. Acumulado: 5100/20000 líneas | global_step=1
111
+
112
+ ==================== BLOQUE 18 ====================
113
+ 🧩 Ejemplos en bloque: 300
114
+ ⏱️ Épocas en este bloque: 0.0150
115
+ {'train_runtime': 8.2736, 'train_samples_per_second': 0.544, 'train_steps_per_second': 0.121, 'train_loss': 1.367121696472168, 'epoch': 0.11}
116
+ ✅ Bloque 18 terminado. Acumulado: 5400/20000 líneas | global_step=1
117
+
118
+ ==================== BLOQUE 19 ====================
119
+ 🧩 Ejemplos en bloque: 300
120
+ ⏱️ Épocas en este bloque: 0.0150
121
+ {'train_runtime': 7.8921, 'train_samples_per_second': 0.57, 'train_steps_per_second': 0.127, 'train_loss': 1.4294403791427612, 'epoch': 0.11}
122
+ ✅ Bloque 19 terminado. Acumulado: 5700/20000 líneas | global_step=1
123
+
124
+ ==================== BLOQUE 20 ====================
125
+ 🧩 Ejemplos en bloque: 300
126
+ ⏱️ Épocas en este bloque: 0.0150
127
+ {'train_runtime': 7.8161, 'train_samples_per_second': 0.576, 'train_steps_per_second': 0.128, 'train_loss': 1.4076168537139893, 'epoch': 0.11}
128
+ ✅ Bloque 20 terminado. Acumulado: 6000/20000 líneas | global_step=1
129
+
130
+ ==================== BLOQUE 21 ====================
131
+ 🧩 Ejemplos en bloque: 300
132
+ ⏱️ Épocas en este bloque: 0.0150
133
+ {'train_runtime': 7.799, 'train_samples_per_second': 0.577, 'train_steps_per_second': 0.128, 'train_loss': 1.3925199508666992, 'epoch': 0.11}
134
+ ✅ Bloque 21 terminado. Acumulado: 6300/20000 líneas | global_step=1
135
+
136
+ ==================== BLOQUE 22 ====================
137
+ 🧩 Ejemplos en bloque: 300
138
+ ⏱️ Épocas en este bloque: 0.0150
139
+ {'train_runtime': 8.2108, 'train_samples_per_second': 0.548, 'train_steps_per_second': 0.122, 'train_loss': 1.3582611083984375, 'epoch': 0.11}
140
+ ✅ Bloque 22 terminado. Acumulado: 6600/20000 líneas | global_step=1
141
+
142
+ ==================== BLOQUE 23 ====================
143
+ 🧩 Ejemplos en bloque: 300
144
+ ⏱️ Épocas en este bloque: 0.0150
145
+ {'train_runtime': 7.9328, 'train_samples_per_second': 0.567, 'train_steps_per_second': 0.126, 'train_loss': 1.3844921588897705, 'epoch': 0.11}
146
+ ✅ Bloque 23 terminado. Acumulado: 6900/20000 líneas | global_step=1
147
+
148
+ ==================== BLOQUE 24 ====================
149
+ 🧩 Ejemplos en bloque: 300
150
+ ⏱️ Épocas en este bloque: 0.0150
151
+ {'train_runtime': 8.5347, 'train_samples_per_second': 0.527, 'train_steps_per_second': 0.117, 'train_loss': 1.4399304389953613, 'epoch': 0.11}
152
+ ✅ Bloque 24 terminado. Acumulado: 7200/20000 líneas | global_step=1
153
+
154
+ ==================== BLOQUE 25 ====================
155
+ 🧩 Ejemplos en bloque: 300
156
+ ⏱️ Épocas en este bloque: 0.0150
157
+ {'train_runtime': 7.945, 'train_samples_per_second': 0.566, 'train_steps_per_second': 0.126, 'train_loss': 1.4433614015579224, 'epoch': 0.11}
158
+ ✅ Bloque 25 terminado. Acumulado: 7500/20000 líneas | global_step=1
159
+
160
+ ==================== BLOQUE 26 ====================
161
+ 🧩 Ejemplos en bloque: 300
162
+ ⏱️ Épocas en este bloque: 0.0150
163
+ {'train_runtime': 7.9109, 'train_samples_per_second': 0.569, 'train_steps_per_second': 0.126, 'train_loss': 1.459897518157959, 'epoch': 0.11}
164
+ ✅ Bloque 26 terminado. Acumulado: 7800/20000 líneas | global_step=1
165
+
166
+ ==================== BLOQUE 27 ====================
167
+ 🧩 Ejemplos en bloque: 300
168
+ ⏱️ Épocas en este bloque: 0.0150
169
+ {'train_runtime': 7.8354, 'train_samples_per_second': 0.574, 'train_steps_per_second': 0.128, 'train_loss': 1.4087351560592651, 'epoch': 0.11}
170
+ ✅ Bloque 27 terminado. Acumulado: 8100/20000 líneas | global_step=1
171
+
172
+ ==================== BLOQUE 28 ====================
173
+ 🧩 Ejemplos en bloque: 300
174
+ ⏱️ Épocas en este bloque: 0.0150
175
+ {'train_runtime': 7.8052, 'train_samples_per_second': 0.577, 'train_steps_per_second': 0.128, 'train_loss': 1.3621701002120972, 'epoch': 0.11}
176
+ ✅ Bloque 28 terminado. Acumulado: 8400/20000 líneas | global_step=1
177
+
178
+ ==================== BLOQUE 29 ====================
179
+ 🧩 Ejemplos en bloque: 300
180
+ ⏱️ Épocas en este bloque: 0.0150
181
+ {'train_runtime': 8.2427, 'train_samples_per_second': 0.546, 'train_steps_per_second': 0.121, 'train_loss': 1.41074538230896, 'epoch': 0.11}
182
+ ✅ Bloque 29 terminado. Acumulado: 8700/20000 líneas | global_step=1
183
+
184
+ ==================== BLOQUE 30 ====================
185
+ 🧩 Ejemplos en bloque: 300
186
+ ⏱️ Épocas en este bloque: 0.0150
187
+ {'train_runtime': 7.8657, 'train_samples_per_second': 0.572, 'train_steps_per_second': 0.127, 'train_loss': 1.352297067642212, 'epoch': 0.11}
188
+ ✅ Bloque 30 terminado. Acumulado: 9000/20000 líneas | global_step=1
189
+
190
+ ==================== BLOQUE 31 ====================
191
+ 🧩 Ejemplos en bloque: 300
192
+ ⏱️ Épocas en este bloque: 0.0150
193
+ {'train_runtime': 7.8948, 'train_samples_per_second': 0.57, 'train_steps_per_second': 0.127, 'train_loss': 1.418897032737732, 'epoch': 0.11}
194
+ ✅ Bloque 31 terminado. Acumulado: 9300/20000 líneas | global_step=1
195
+
196
+ ==================== BLOQUE 32 ====================
197
+ 🧩 Ejemplos en bloque: 300
198
+ ⏱️ Épocas en este bloque: 0.0150
199
+ {'train_runtime': 7.8016, 'train_samples_per_second': 0.577, 'train_steps_per_second': 0.128, 'train_loss': 1.4292913675308228, 'epoch': 0.11}
200
+ ✅ Bloque 32 terminado. Acumulado: 9600/20000 líneas | global_step=1
201
+
202
+ ==================== BLOQUE 33 ====================
203
+ 🧩 Ejemplos en bloque: 300
204
+ ⏱️ Épocas en este bloque: 0.0150
205
+ {'train_runtime': 7.8676, 'train_samples_per_second': 0.572, 'train_steps_per_second': 0.127, 'train_loss': 1.3893479108810425, 'epoch': 0.11}
206
+ ✅ Bloque 33 terminado. Acumulado: 9900/20000 líneas | global_step=1
207
+
208
+ ==================== BLOQUE 34 ====================
209
+ 🧩 Ejemplos en bloque: 300
210
+ ⏱️ Épocas en este bloque: 0.0150
211
+ {'train_runtime': 8.3348, 'train_samples_per_second': 0.54, 'train_steps_per_second': 0.12, 'train_loss': 1.3993480205535889, 'epoch': 0.11}
212
+ ✅ Bloque 34 terminado. Acumulado: 10200/20000 líneas | global_step=1
213
+
214
+ ==================== BLOQUE 35 ====================
215
+ 🧩 Ejemplos en bloque: 300
216
+ ⏱️ Épocas en este bloque: 0.0150
217
+ {'train_runtime': 7.8154, 'train_samples_per_second': 0.576, 'train_steps_per_second': 0.128, 'train_loss': 1.4724717140197754, 'epoch': 0.11}
218
+ ✅ Bloque 35 terminado. Acumulado: 10500/20000 líneas | global_step=1
219
+
220
+ ==================== BLOQUE 36 ====================
221
+ 🧩 Ejemplos en bloque: 300
222
+ ⏱️ Épocas en este bloque: 0.0150
223
+ {'train_runtime': 8.2463, 'train_samples_per_second': 0.546, 'train_steps_per_second': 0.121, 'train_loss': 1.3512485027313232, 'epoch': 0.11}
224
+ ✅ Bloque 36 terminado. Acumulado: 10800/20000 líneas | global_step=1
225
+
226
+ ==================== BLOQUE 37 ====================
227
+ 🧩 Ejemplos en bloque: 300
228
+ ⏱️ Épocas en este bloque: 0.0150
229
+ {'train_runtime': 8.0336, 'train_samples_per_second': 0.56, 'train_steps_per_second': 0.124, 'train_loss': 1.3714691400527954, 'epoch': 0.11}
230
+ ✅ Bloque 37 terminado. Acumulado: 11100/20000 líneas | global_step=1
231
+
232
+ ==================== BLOQUE 38 ====================
233
+ 🧩 Ejemplos en bloque: 300
234
+ ⏱️ Épocas en este bloque: 0.0150
235
+ {'train_runtime': 8.0084, 'train_samples_per_second': 0.562, 'train_steps_per_second': 0.125, 'train_loss': 1.4171221256256104, 'epoch': 0.11}
236
+ ✅ Bloque 38 terminado. Acumulado: 11400/20000 líneas | global_step=1
237
+
238
+ ==================== BLOQUE 39 ====================
239
+ 🧩 Ejemplos en bloque: 300
240
+ ⏱️ Épocas en este bloque: 0.0150
241
+ {'train_runtime': 7.8908, 'train_samples_per_second': 0.57, 'train_steps_per_second': 0.127, 'train_loss': 1.4297702312469482, 'epoch': 0.11}
242
+ ✅ Bloque 39 terminado. Acumulado: 11700/20000 líneas | global_step=1
243
+
244
+ ==================== BLOQUE 40 ====================
245
+ 🧩 Ejemplos en bloque: 300
246
+ ⏱️ Épocas en este bloque: 0.0150
247
+ {'train_runtime': 7.8308, 'train_samples_per_second': 0.575, 'train_steps_per_second': 0.128, 'train_loss': 1.4469853639602661, 'epoch': 0.11}
248
+ ✅ Bloque 40 terminado. Acumulado: 12000/20000 líneas | global_step=1
249
+
250
+ ==================== BLOQUE 41 ====================
251
+ 🧩 Ejemplos en bloque: 300
252
+ ⏱️ Épocas en este bloque: 0.0150
253
+ {'train_runtime': 8.0149, 'train_samples_per_second': 0.561, 'train_steps_per_second': 0.125, 'train_loss': 1.4268006086349487, 'epoch': 0.11}
254
+ ✅ Bloque 41 terminado. Acumulado: 12300/20000 líneas | global_step=1
255
+
256
+ ==================== BLOQUE 42 ====================
257
+ 🧩 Ejemplos en bloque: 300
258
+ ⏱️ Épocas en este bloque: 0.0150
259
+ {'train_runtime': 7.9279, 'train_samples_per_second': 0.568, 'train_steps_per_second': 0.126, 'train_loss': 1.4060312509536743, 'epoch': 0.11}
260
+ ✅ Bloque 42 terminado. Acumulado: 12600/20000 líneas | global_step=1
261
+
262
+ ==================== BLOQUE 43 ====================
263
+ 🧩 Ejemplos en bloque: 300
264
+ ⏱️ Épocas en este bloque: 0.0150
265
+ {'train_runtime': 7.927, 'train_samples_per_second': 0.568, 'train_steps_per_second': 0.126, 'train_loss': 1.473849892616272, 'epoch': 0.11}
266
+ ✅ Bloque 43 terminado. Acumulado: 12900/20000 líneas | global_step=1
267
+
268
+ ==================== BLOQUE 44 ====================
269
+ 🧩 Ejemplos en bloque: 300
270
+ ⏱️ Épocas en este bloque: 0.0150
271
+ {'train_runtime': 7.8225, 'train_samples_per_second': 0.575, 'train_steps_per_second': 0.128, 'train_loss': 1.4086536169052124, 'epoch': 0.11}
272
+ ✅ Bloque 44 terminado. Acumulado: 13200/20000 líneas | global_step=1
273
+
274
+ ==================== BLOQUE 45 ====================
275
+ 🧩 Ejemplos en bloque: 300
276
+ ⏱️ Épocas en este bloque: 0.0150
277
+ {'train_runtime': 8.038, 'train_samples_per_second': 0.56, 'train_steps_per_second': 0.124, 'train_loss': 1.4090956449508667, 'epoch': 0.11}
278
+ ✅ Bloque 45 terminado. Acumulado: 13500/20000 líneas | global_step=1
279
+
280
+ ==================== BLOQUE 46 ====================
281
+ 🧩 Ejemplos en bloque: 300
282
+ ⏱️ Épocas en este bloque: 0.0150
283
+ {'train_runtime': 7.8592, 'train_samples_per_second': 0.573, 'train_steps_per_second': 0.127, 'train_loss': 1.4210567474365234, 'epoch': 0.11}
284
+ ✅ Bloque 46 terminado. Acumulado: 13800/20000 líneas | global_step=1
285
+
286
+ ==================== BLOQUE 47 ====================
287
+ 🧩 Ejemplos en bloque: 300
288
+ ⏱️ Épocas en este bloque: 0.0150
289
+ {'train_runtime': 7.985, 'train_samples_per_second': 0.564, 'train_steps_per_second': 0.125, 'train_loss': 1.357202172279358, 'epoch': 0.11}
290
+ ✅ Bloque 47 terminado. Acumulado: 14100/20000 líneas | global_step=1
291
+
292
+ ==================== BLOQUE 48 ====================
293
+ 🧩 Ejemplos en bloque: 300
294
+ ⏱️ Épocas en este bloque: 0.0150
295
+ {'train_runtime': 7.8731, 'train_samples_per_second': 0.572, 'train_steps_per_second': 0.127, 'train_loss': 1.4002933502197266, 'epoch': 0.11}
296
+ ✅ Bloque 48 terminado. Acumulado: 14400/20000 líneas | global_step=1
297
+
298
+ ==================== BLOQUE 49 ====================
299
+ 🧩 Ejemplos en bloque: 300
300
+ ⏱️ Épocas en este bloque: 0.0150
301
+ {'train_runtime': 7.8413, 'train_samples_per_second': 0.574, 'train_steps_per_second': 0.128, 'train_loss': 1.4064139127731323, 'epoch': 0.11}
302
+ ✅ Bloque 49 terminado. Acumulado: 14700/20000 líneas | global_step=1
303
+
304
+ ==================== BLOQUE 50 ====================
305
+ 🧩 Ejemplos en bloque: 300
306
+ ⏱️ Épocas en este bloque: 0.0150
307
+ {'train_runtime': 7.9053, 'train_samples_per_second': 0.569, 'train_steps_per_second': 0.126, 'train_loss': 1.4215424060821533, 'epoch': 0.11}
308
+ ✅ Bloque 50 terminado. Acumulado: 15000/20000 líneas | global_step=1
309
+
310
+ ==================== BLOQUE 51 ====================
311
+ 🧩 Ejemplos en bloque: 300
312
+ ⏱️ Épocas en este bloque: 0.0150
313
+ {'train_runtime': 8.4462, 'train_samples_per_second': 0.533, 'train_steps_per_second': 0.118, 'train_loss': 1.397919774055481, 'epoch': 0.11}
314
+ ✅ Bloque 51 terminado. Acumulado: 15300/20000 líneas | global_step=1
315
+
316
+ ==================== BLOQUE 52 ====================
317
+ 🧩 Ejemplos en bloque: 300
318
+ ⏱️ Épocas en este bloque: 0.0150
319
+ {'train_runtime': 7.8452, 'train_samples_per_second': 0.574, 'train_steps_per_second': 0.127, 'train_loss': 1.4646857976913452, 'epoch': 0.11}
320
+ ✅ Bloque 52 terminado. Acumulado: 15600/20000 líneas | global_step=1
321
+
322
+ ==================== BLOQUE 53 ====================
323
+ 🧩 Ejemplos en bloque: 300
324
+ ⏱️ Épocas en este bloque: 0.0150
325
+ {'train_runtime': 7.7995, 'train_samples_per_second': 0.577, 'train_steps_per_second': 0.128, 'train_loss': 1.395652413368225, 'epoch': 0.11}
326
+ ✅ Bloque 53 terminado. Acumulado: 15900/20000 líneas | global_step=1
327
+
328
+ ==================== BLOQUE 54 ====================
329
+ 🧩 Ejemplos en bloque: 300
330
+ ⏱️ Épocas en este bloque: 0.0150
331
+ {'train_runtime': 7.8485, 'train_samples_per_second': 0.573, 'train_steps_per_second': 0.127, 'train_loss': 1.367691159248352, 'epoch': 0.11}
332
+ ✅ Bloque 54 terminado. Acumulado: 16200/20000 líneas | global_step=1
333
+
334
+ ==================== BLOQUE 55 ====================
335
+ 🧩 Ejemplos en bloque: 300
336
+ ⏱️ Épocas en este bloque: 0.0150
337
+ {'train_runtime': 7.9295, 'train_samples_per_second': 0.567, 'train_steps_per_second': 0.126, 'train_loss': 1.4012526273727417, 'epoch': 0.11}
338
+ ✅ Bloque 55 terminado. Acumulado: 16500/20000 líneas | global_step=1
339
+
340
+ ==================== BLOQUE 56 ====================
341
+ 🧩 Ejemplos en bloque: 300
342
+ ⏱️ Épocas en este bloque: 0.0150
343
+ {'train_runtime': 7.884, 'train_samples_per_second': 0.571, 'train_steps_per_second': 0.127, 'train_loss': 1.3737818002700806, 'epoch': 0.11}
344
+ ✅ Bloque 56 terminado. Acumulado: 16800/20000 líneas | global_step=1
345
+
346
+ ==================== BLOQUE 57 ====================
347
+ 🧩 Ejemplos en bloque: 300
348
+ ⏱️ Épocas en este bloque: 0.0150
349
+ {'train_runtime': 7.79, 'train_samples_per_second': 0.578, 'train_steps_per_second': 0.128, 'train_loss': 1.4407134056091309, 'epoch': 0.11}
350
+ ✅ Bloque 57 terminado. Acumulado: 17100/20000 líneas | global_step=1
351
+
352
+ ==================== BLOQUE 58 ====================
353
+ 🧩 Ejemplos en bloque: 300
354
+ ⏱️ Épocas en este bloque: 0.0150
355
+ {'train_runtime': 7.9038, 'train_samples_per_second': 0.569, 'train_steps_per_second': 0.127, 'train_loss': 1.4103862047195435, 'epoch': 0.11}
356
+ ✅ Bloque 58 terminado. Acumulado: 17400/20000 líneas | global_step=1
357
+
358
+ ==================== BLOQUE 59 ====================
359
+ 🧩 Ejemplos en bloque: 300
360
+ ⏱️ Épocas en este bloque: 0.0150
361
+ {'train_runtime': 7.9517, 'train_samples_per_second': 0.566, 'train_steps_per_second': 0.126, 'train_loss': 1.4260621070861816, 'epoch': 0.11}
362
+ ✅ Bloque 59 terminado. Acumulado: 17700/20000 líneas | global_step=1
363
+
364
+ ==================== BLOQUE 60 ====================
365
+ 🧩 Ejemplos en bloque: 300
366
+ ⏱️ Épocas en este bloque: 0.0150
367
+ {'train_runtime': 7.8696, 'train_samples_per_second': 0.572, 'train_steps_per_second': 0.127, 'train_loss': 1.43145751953125, 'epoch': 0.11}
368
+ ✅ Bloque 60 terminado. Acumulado: 18000/20000 líneas | global_step=1
369
+
370
+ ==================== BLOQUE 61 ====================
371
+ 🧩 Ejemplos en bloque: 300
372
+ ⏱️ Épocas en este bloque: 0.0150
373
+ {'train_runtime': 7.7647, 'train_samples_per_second': 0.58, 'train_steps_per_second': 0.129, 'train_loss': 1.4067848920822144, 'epoch': 0.11}
374
+ ✅ Bloque 61 terminado. Acumulado: 18300/20000 líneas | global_step=1
375
+
376
+ ==================== BLOQUE 62 ====================
377
+ 🧩 Ejemplos en bloque: 300
378
+ ⏱️ Épocas en este bloque: 0.0150
379
+ {'train_runtime': 7.8886, 'train_samples_per_second': 0.57, 'train_steps_per_second': 0.127, 'train_loss': 1.444780707359314, 'epoch': 0.11}
380
+ ✅ Bloque 62 terminado. Acumulado: 18600/20000 líneas | global_step=1
381
+
382
+ ==================== BLOQUE 63 ====================
383
+ 🧩 Ejemplos en bloque: 300
384
+ ⏱️ Épocas en este bloque: 0.0150
385
+ {'train_runtime': 7.7427, 'train_samples_per_second': 0.581, 'train_steps_per_second': 0.129, 'train_loss': 1.397607445716858, 'epoch': 0.11}
386
+ ✅ Bloque 63 terminado. Acumulado: 18900/20000 líneas | global_step=1
387
+
388
+ ==================== BLOQUE 64 ====================
389
+ 🧩 Ejemplos en bloque: 300
390
+ ⏱️ Épocas en este bloque: 0.0150
391
+ {'train_runtime': 7.832, 'train_samples_per_second': 0.575, 'train_steps_per_second': 0.128, 'train_loss': 1.3948101997375488, 'epoch': 0.11}
392
+ ✅ Bloque 64 terminado. Acumulado: 19200/20000 líneas | global_step=1
393
+
394
+ ==================== BLOQUE 65 ====================
395
+ 🧩 Ejemplos en bloque: 300
396
+ ⏱️ Épocas en este bloque: 0.0150
397
+ {'train_runtime': 7.8257, 'train_samples_per_second': 0.575, 'train_steps_per_second': 0.128, 'train_loss': 1.3686237335205078, 'epoch': 0.11}
398
+ ✅ Bloque 65 terminado. Acumulado: 19500/20000 líneas | global_step=1
399
+
400
+ ==================== BLOQUE 66 ====================
401
+ 🧩 Ejemplos en bloque: 300
402
+ ⏱️ Épocas en este bloque: 0.0150
403
+ {'train_runtime': 7.8839, 'train_samples_per_second': 0.571, 'train_steps_per_second': 0.127, 'train_loss': 1.4106920957565308, 'epoch': 0.11}
404
+ ✅ Bloque 66 terminado. Acumulado: 19800/20000 líneas | global_step=1
405
+
406
+ ==================== BLOQUE 67 ====================
407
+ 🧩 Ejemplos en bloque: 200
408
+ ⏱️ Épocas en este bloque: 0.0100
409
+ {'train_runtime': 7.9257, 'train_samples_per_second': 0.252, 'train_steps_per_second': 0.126, 'train_loss': 1.433011770248413, 'epoch': 0.16}
410
+ ✅ Bloque 67 terminado. Acumulado: 20000/20000 líneas | global_step=1
411
+
412
+ 🎉 Entrenamiento TOTAL completado. Modelo en: /workspace/output/starcoder_1b_qlora
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d006cbe0053f1bdb26902e5133c9d463f9a400f19d1fce74eea896e0a58c543
3
+ size 5432
vocab.json ADDED
The diff for this file is too large to render. See raw diff