NorthernTribe-Research commited on
Commit
97a59ed
·
verified ·
1 Parent(s): 943f7e0

Publish UMSR reasoning model

Browse files
README.md CHANGED
@@ -2,23 +2,182 @@
2
  language:
3
  - en
4
  library_name: transformers
 
5
  datasets:
6
  - NorthernTribe-Research/UMSR-v1
7
  tags:
8
  - reasoning
9
- - autonomous-training
 
 
 
 
 
10
  ---
11
 
12
- # UMSR Reasoner 7B
13
 
14
- Standalone reasoning model trained from UMSR-v1 using the autonomous trainer Space.
15
 
16
- - Dataset: https://huggingface.co/datasets/NorthernTribe-Research/UMSR-v1
17
- - Base model: `sshleifer/tiny-gpt2`
18
- - Model repo: `https://huggingface.co/NorthernTribe-Research/UMSR-Reasoner-7B`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Output Contract
21
 
22
- Use:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- `<final_answer>...</final_answer>`
 
 
 
2
  language:
3
  - en
4
  library_name: transformers
5
+ pipeline_tag: text-generation
6
  datasets:
7
  - NorthernTribe-Research/UMSR-v1
8
  tags:
9
  - reasoning
10
+ - instruction-following
11
+ - structured-output
12
+ - math
13
+ - science
14
+ - logic
15
+ - strategy
16
  ---
17
 
18
+ # UMSR-Reasoner-7B
19
 
20
+ ## Overview
21
 
22
+ UMSR-Reasoner-7B is a standalone 7B reasoning model for structured multi-step problem solving.
23
+
24
+ It is optimized for tasks that require:
25
+
26
+ - explicit reasoning traces
27
+ - deterministic final-answer formatting
28
+ - consistent performance across math, science, logic, and strategy domains
29
+
30
+ ## Dataset
31
+
32
+ - Primary dataset: https://huggingface.co/datasets/NorthernTribe-Research/UMSR-v1
33
+
34
+ ## Training Strategy
35
+
36
+ - student architecture: `NorthernTribe-Research/UMSR-Reasoner-7B`
37
+ - teacher architecture: `NorthernTribe-Research/UMSR-Reasoner-7B` self-distillation (default)
38
+ - objective: blended CE + KL distillation with temperature and weight scheduling
39
+ - continuity: checkpointed autonomous training cycles with resume support
40
 
41
  ## Output Contract
42
 
43
+ For best reliability, prompt the model to end with:
44
+
45
+ ```text
46
+ <final_answer>...</final_answer>
47
+ ```
48
+
49
+ Optional reasoning can be requested in:
50
+
51
+ ```text
52
+ <reasoning>...</reasoning>
53
+ ```
54
+
55
+ ## Model Tree
56
+
57
+ | Variant | Repository | Purpose |
58
+ |---|---|---|
59
+ | Base FP model | `NorthernTribe-Research/UMSR-Reasoner-7B` | Primary inference and fine-tuning target |
60
+ | INT8 runtime profile | `NorthernTribe-Research/UMSR-Reasoner-7B-INT8` | Lower-memory deployment |
61
+ | NF4 runtime profile | `NorthernTribe-Research/UMSR-Reasoner-7B-NF4` | Max compression for constrained GPUs |
62
+ | Smoke INT8 profile | `NorthernTribe-Research/UMSR-Reasoner-7B-Smoke-INT8` | Fast CI/smoke validation profile linked to the main model tree |
63
+
64
+ ## Quickstart
65
+
66
+ ```python
67
+ import torch
68
+ from transformers import AutoModelForCausalLM, AutoTokenizer
69
+
70
+ model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
71
+
72
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
73
+ model = AutoModelForCausalLM.from_pretrained(
74
+ model_id,
75
+ torch_dtype=torch.bfloat16,
76
+ device_map="auto",
77
+ )
78
+
79
+ messages = [
80
+ {"role": "system", "content": "Solve step by step and finish with <final_answer>...</final_answer>."},
81
+ {"role": "user", "content": "If 3x + 5 = 20, what is x?"},
82
+ ]
83
+
84
+ inputs = tokenizer.apply_chat_template(
85
+ messages,
86
+ add_generation_prompt=True,
87
+ return_tensors="pt",
88
+ ).to(model.device)
89
+
90
+ outputs = model.generate(
91
+ inputs,
92
+ max_new_tokens=256,
93
+ temperature=0.2,
94
+ top_p=0.9,
95
+ )
96
+
97
+ print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
98
+ ```
99
+
100
+ ## Quantized Runtime
101
+
102
+ ### INT8
103
+
104
+ ```python
105
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
106
+
107
+ model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
108
+ bnb_config = BitsAndBytesConfig(load_in_8bit=True)
109
+
110
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
111
+ model = AutoModelForCausalLM.from_pretrained(
112
+ model_id,
113
+ device_map="auto",
114
+ quantization_config=bnb_config,
115
+ )
116
+ ```
117
+
118
+ ### NF4
119
+
120
+ ```python
121
+ import torch
122
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
123
+
124
+ model_id = "NorthernTribe-Research/UMSR-Reasoner-7B"
125
+ bnb_config = BitsAndBytesConfig(
126
+ load_in_4bit=True,
127
+ bnb_4bit_quant_type="nf4",
128
+ bnb_4bit_use_double_quant=True,
129
+ bnb_4bit_compute_dtype=torch.bfloat16,
130
+ )
131
+
132
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
133
+ model = AutoModelForCausalLM.from_pretrained(
134
+ model_id,
135
+ device_map="auto",
136
+ quantization_config=bnb_config,
137
+ )
138
+ ```
139
+
140
+ ## Llamafile Packaging
141
+
142
+ For single-binary deployment, use:
143
+
144
+ ```bash
145
+ python scripts/create_llamafile.py \
146
+ --gguf /path/to/UMSR-Reasoner-7B.Q4_K_M.gguf \
147
+ --runtime-bin tools/llamafile \
148
+ --output dist/UMSR-Reasoner-7B.Q4_K_M.llamafile \
149
+ --force
150
+ ```
151
+
152
+ ## Code-Aware Robust Evaluation
153
+
154
+ `scripts/eval_reasoner.py` supports code-focused robustness checks:
155
+
156
+ - code-task detection
157
+ - Python code-block syntax validation
158
+ - optional row-level unit-test execution
159
+ - optional TensorFlow-backed multi-candidate scorer
160
+
161
+ ## Trainer Integration
162
+
163
+ An autonomous trainer Space is available for continuous training cycles against UMSR-v1. It supports:
164
+
165
+ - teacher-student distillation mode with configurable in-house teacher model
166
+ - live run telemetry (`live_progress.json`, `live_events.jsonl`) for real-time monitoring
167
+ - scheduled or continuous operation
168
+ - checkpoint auto-resume (`UMSR_RESUME_FROM_CHECKPOINT=auto`)
169
+ - warmup-step and warmup-ratio control
170
+ - push-to-hub automation
171
+ - run monitoring through live dashboard and logs
172
+
173
+ ## Best Practices
174
+
175
+ - keep prompts explicit about output tags
176
+ - validate final answers for high-stakes workflows
177
+ - prefer domain-specific evaluation before production deployment
178
+
179
+ ## Limitations
180
 
181
+ - reasoning text may contain errors even when final format is correct
182
+ - quality depends on prompt clarity and task scope
183
+ - not suitable as a sole decision-maker for legal, medical, or safety-critical use
config.json CHANGED
@@ -18,7 +18,7 @@
18
  "n_inner": null,
19
  "n_layer": 2,
20
  "n_positions": 1024,
21
- "pad_token_id": 50256,
22
  "reorder_and_upcast_attn": false,
23
  "resid_pdrop": 0.1,
24
  "scale_attn_by_inverse_layer_idx": false,
@@ -34,7 +34,7 @@
34
  "max_length": 50
35
  }
36
  },
37
- "tie_word_embeddings": false,
38
  "transformers_version": "5.2.0",
39
  "use_cache": false,
40
  "vocab_size": 50257
 
18
  "n_inner": null,
19
  "n_layer": 2,
20
  "n_positions": 1024,
21
+ "pad_token_id": null,
22
  "reorder_and_upcast_attn": false,
23
  "resid_pdrop": 0.1,
24
  "scale_attn_by_inverse_layer_idx": false,
 
34
  "max_length": 50
35
  }
36
  },
37
+ "tie_word_embeddings": true,
38
  "transformers_version": "5.2.0",
39
  "use_cache": false,
40
  "vocab_size": 50257
metrics/eval_metrics.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "eval_loss": 10.716222763061523,
4
- "eval_runtime": 1.9392,
5
- "eval_samples": 64,
6
- "eval_samples_per_second": 33.003,
7
- "eval_steps_per_second": 33.003
8
- }
 
1
  {
2
  "epoch": 1.0,
3
+ "eval_loss": 5.414119720458984,
4
+ "eval_runtime": 0.7383,
5
+ "eval_samples": 8,
6
+ "eval_samples_per_second": 10.836,
7
+ "eval_steps_per_second": 10.836
8
+ }
metrics/train_metrics.json CHANGED
@@ -1,9 +1,16 @@
1
  {
 
 
2
  "epoch": 1.0,
3
- "total_flos": 39501942396.0,
4
- "train_loss": 10.738998085260391,
5
- "train_runtime": 49.2357,
6
- "train_samples": 256,
7
- "train_samples_per_second": 5.199,
8
- "train_steps_per_second": 5.199
9
- }
 
 
 
 
 
 
1
  {
2
+ "ce_weight_end": 0.5,
3
+ "ce_weight_start": 0.5,
4
  "epoch": 1.0,
5
+ "kd_weight_end": 0.5,
6
+ "kd_weight_start": 0.5,
7
+ "teacher_count": 1,
8
+ "temperature_end": 1.5,
9
+ "temperature_start": 2.0,
10
+ "total_flos": 3359425752.0,
11
+ "train_loss": 5.411877933301423,
12
+ "train_runtime": 7.7818,
13
+ "train_samples": 19,
14
+ "train_samples_per_second": 2.442,
15
+ "train_steps_per_second": 2.442
16
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8af636bf1ea0ba12e8e0ed9858fe7a4fb9ce267806245af516a5f024c1c370c1
3
  size 413296
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b75f6f94d4b470fd42d5cb3c7135ca7c1d669d0514eddce08abe5825cf9d5c48
3
  size 413296
run_summary.json CHANGED
@@ -1,17 +1,15 @@
1
  {
2
- "base_model": "sshleifer/tiny-gpt2",
3
- "bf16": false,
4
- "cuda_available": false,
5
  "dataset_id": "NorthernTribe-Research/UMSR-v1",
6
- "device": "cpu",
7
- "eval_rows": 64,
8
- "finished_at": "2026-02-23T22:08:30.897247+00:00",
9
- "fp16": false,
10
- "mps_available": false,
11
- "output_dir": "/app/runs/20260223_220733",
12
- "target_repo_id": "NorthernTribe-Research/UMSR-Reasoner-7B",
13
- "tie_word_embeddings": false,
14
- "total_train_steps_estimate": 256,
15
- "train_rows": 256,
16
- "warmup_steps": 0
17
  }
 
1
  {
2
+ "ce_weight_end": 0.5,
3
+ "ce_weight_start": 0.5,
 
4
  "dataset_id": "NorthernTribe-Research/UMSR-v1",
5
+ "eval_rows": 8,
6
+ "kd_weight_end": 0.5,
7
+ "kd_weight_start": 0.5,
8
+ "output_dir": "outputs/umsr_reasoner_7b_standalone",
9
+ "student_model": "NorthernTribe-Research/UMSR-Reasoner-7B",
10
+ "teacher_count": 1,
11
+ "teacher_model": "NorthernTribe-Research/UMSR-Reasoner-7B",
12
+ "temperature_end": 1.5,
13
+ "temperature_start": 2.0,
14
+ "train_rows": 19
 
15
  }
tokenizer.json CHANGED
@@ -1,11 +1,6 @@
1
  {
2
  "version": "1.0",
3
- "truncation": {
4
- "direction": "Right",
5
- "max_length": 512,
6
- "strategy": "LongestFirst",
7
- "stride": 0
8
- },
9
  "padding": null,
10
  "added_tokens": [
11
  {
 
1
  {
2
  "version": "1.0",
3
+ "truncation": null,
 
 
 
 
 
4
  "padding": null,
5
  "added_tokens": [
6
  {
trainer_state.json CHANGED
@@ -3,282 +3,192 @@
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
  "epoch": 1.0,
6
- "eval_steps": 25,
7
- "global_step": 256,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
- "epoch": 0.0390625,
14
- "grad_norm": 0.6599220633506775,
15
- "learning_rate": 9.6484375e-05,
16
- "loss": 10.769454956054688,
17
- "step": 10
18
- },
19
- {
20
- "epoch": 0.078125,
21
- "grad_norm": 0.7802038192749023,
22
- "learning_rate": 9.257812500000001e-05,
23
- "loss": 10.763140106201172,
24
- "step": 20
25
- },
26
- {
27
- "epoch": 0.09765625,
28
- "eval_loss": 10.76131820678711,
29
- "eval_runtime": 2.0029,
30
- "eval_samples_per_second": 31.953,
31
- "eval_steps_per_second": 31.953,
32
- "step": 25
33
- },
34
- {
35
- "epoch": 0.1171875,
36
- "grad_norm": 0.17565655708312988,
37
- "learning_rate": 8.8671875e-05,
38
- "loss": 10.768231201171876,
39
- "step": 30
40
  },
41
  {
42
- "epoch": 0.15625,
43
- "grad_norm": 0.4683985710144043,
44
- "learning_rate": 8.4765625e-05,
45
- "loss": 10.772106170654297,
46
- "step": 40
47
  },
48
  {
49
- "epoch": 0.1953125,
50
- "grad_norm": 0.3628169298171997,
51
- "learning_rate": 8.0859375e-05,
52
- "loss": 10.764241790771484,
53
- "step": 50
54
  },
55
  {
56
- "epoch": 0.1953125,
57
- "eval_loss": 10.752297401428223,
58
- "eval_runtime": 2.1266,
59
- "eval_samples_per_second": 30.095,
60
- "eval_steps_per_second": 30.095,
61
- "step": 50
62
  },
63
  {
64
- "epoch": 0.234375,
65
- "grad_norm": 0.6150558590888977,
66
- "learning_rate": 7.695312500000001e-05,
67
- "loss": 10.760368347167969,
68
- "step": 60
 
69
  },
70
  {
71
- "epoch": 0.2734375,
72
- "grad_norm": 0.605256974697113,
73
- "learning_rate": 7.3046875e-05,
74
- "loss": 10.758135986328124,
75
- "step": 70
76
  },
77
  {
78
- "epoch": 0.29296875,
79
- "eval_loss": 10.743587493896484,
80
- "eval_runtime": 2.0832,
81
- "eval_samples_per_second": 30.721,
82
- "eval_steps_per_second": 30.721,
83
- "step": 75
84
  },
85
  {
86
- "epoch": 0.3125,
87
- "grad_norm": 0.3574382960796356,
88
- "learning_rate": 6.9140625e-05,
89
- "loss": 10.750237274169923,
90
- "step": 80
91
  },
92
  {
93
- "epoch": 0.3515625,
94
- "grad_norm": 0.6408470869064331,
95
- "learning_rate": 6.5234375e-05,
96
- "loss": 10.751436614990235,
97
- "step": 90
98
  },
99
  {
100
- "epoch": 0.390625,
101
- "grad_norm": 0.2543087303638458,
102
- "learning_rate": 6.132812500000001e-05,
103
- "loss": 10.753249359130859,
104
- "step": 100
 
105
  },
106
  {
107
- "epoch": 0.390625,
108
- "eval_loss": 10.736135482788086,
109
- "eval_runtime": 1.9456,
110
- "eval_samples_per_second": 32.894,
111
- "eval_steps_per_second": 32.894,
112
- "step": 100
113
  },
114
  {
115
- "epoch": 0.4296875,
116
- "grad_norm": 0.29619327187538147,
117
- "learning_rate": 5.7421875000000005e-05,
118
- "loss": 10.720957946777343,
119
- "step": 110
120
- },
121
- {
122
- "epoch": 0.46875,
123
- "grad_norm": 0.19426824152469635,
124
- "learning_rate": 5.3515625e-05,
125
- "loss": 10.75610580444336,
126
- "step": 120
127
- },
128
- {
129
- "epoch": 0.48828125,
130
- "eval_loss": 10.730175018310547,
131
- "eval_runtime": 2.0492,
132
- "eval_samples_per_second": 31.232,
133
- "eval_steps_per_second": 31.232,
134
- "step": 125
135
- },
136
- {
137
- "epoch": 0.5078125,
138
- "grad_norm": 0.400691419839859,
139
- "learning_rate": 4.9609375000000005e-05,
140
- "loss": 10.729013061523437,
141
- "step": 130
142
- },
143
- {
144
- "epoch": 0.546875,
145
- "grad_norm": 0.2926430106163025,
146
- "learning_rate": 4.5703125e-05,
147
- "loss": 10.718487548828126,
148
- "step": 140
149
- },
150
- {
151
- "epoch": 0.5859375,
152
- "grad_norm": 0.24082393944263458,
153
- "learning_rate": 4.1796875000000005e-05,
154
- "loss": 10.736891174316407,
155
- "step": 150
156
- },
157
- {
158
- "epoch": 0.5859375,
159
- "eval_loss": 10.725324630737305,
160
- "eval_runtime": 1.9834,
161
- "eval_samples_per_second": 32.268,
162
- "eval_steps_per_second": 32.268,
163
- "step": 150
164
- },
165
- {
166
- "epoch": 0.625,
167
- "grad_norm": 0.371867835521698,
168
- "learning_rate": 3.7890625e-05,
169
- "loss": 10.728065490722656,
170
- "step": 160
171
- },
172
- {
173
- "epoch": 0.6640625,
174
- "grad_norm": 0.3023074269294739,
175
- "learning_rate": 3.3984375000000004e-05,
176
- "loss": 10.73614501953125,
177
- "step": 170
178
- },
179
- {
180
- "epoch": 0.68359375,
181
- "eval_loss": 10.721412658691406,
182
- "eval_runtime": 2.1993,
183
- "eval_samples_per_second": 29.1,
184
- "eval_steps_per_second": 29.1,
185
- "step": 175
186
  },
187
  {
188
- "epoch": 0.703125,
189
- "grad_norm": 0.38127991557121277,
190
- "learning_rate": 3.0078125e-05,
191
- "loss": 10.722113037109375,
192
- "step": 180
193
  },
194
  {
195
- "epoch": 0.7421875,
196
- "grad_norm": 0.3930608928203583,
197
- "learning_rate": 2.6171875e-05,
198
- "loss": 10.726372528076173,
199
- "step": 190
200
  },
201
  {
202
- "epoch": 0.78125,
203
- "grad_norm": 0.3989920914173126,
204
- "learning_rate": 2.2265625e-05,
205
- "loss": 10.720677185058594,
206
- "step": 200
 
207
  },
208
  {
209
- "epoch": 0.78125,
210
- "eval_loss": 10.718664169311523,
211
- "eval_runtime": 2.0837,
212
- "eval_samples_per_second": 30.714,
213
- "eval_steps_per_second": 30.714,
214
- "step": 200
215
  },
216
  {
217
- "epoch": 0.8203125,
218
- "grad_norm": 0.20773838460445404,
219
- "learning_rate": 1.8359375e-05,
220
- "loss": 10.722401428222657,
221
- "step": 210
222
  },
223
  {
224
- "epoch": 0.859375,
225
- "grad_norm": 0.31527552008628845,
226
- "learning_rate": 1.4453125e-05,
227
- "loss": 10.71902084350586,
228
- "step": 220
229
  },
230
  {
231
- "epoch": 0.87890625,
232
- "eval_loss": 10.71699047088623,
233
- "eval_runtime": 2.0085,
234
- "eval_samples_per_second": 31.864,
235
- "eval_steps_per_second": 31.864,
236
- "step": 225
237
  },
238
  {
239
- "epoch": 0.8984375,
240
- "grad_norm": 0.2394668161869049,
241
- "learning_rate": 1.0546875e-05,
242
- "loss": 10.706802368164062,
243
- "step": 230
 
244
  },
245
  {
246
- "epoch": 0.9375,
247
- "grad_norm": 0.23905383050441742,
248
- "learning_rate": 6.6406250000000005e-06,
249
- "loss": 10.697933959960938,
250
- "step": 240
251
  },
252
  {
253
- "epoch": 0.9765625,
254
- "grad_norm": 0.3481399416923523,
255
- "learning_rate": 2.734375e-06,
256
- "loss": 10.72392578125,
257
- "step": 250
258
  },
259
  {
260
- "epoch": 0.9765625,
261
- "eval_loss": 10.716254234313965,
262
- "eval_runtime": 2.0417,
263
- "eval_samples_per_second": 31.347,
264
- "eval_steps_per_second": 31.347,
265
- "step": 250
266
  },
267
  {
268
  "epoch": 1.0,
269
- "step": 256,
270
- "total_flos": 39501942396.0,
271
- "train_loss": 10.738998085260391,
272
- "train_runtime": 49.2357,
273
- "train_samples_per_second": 5.199,
274
- "train_steps_per_second": 5.199
275
  }
276
  ],
277
- "logging_steps": 10,
278
- "max_steps": 256,
279
  "num_input_tokens_seen": 0,
280
  "num_train_epochs": 1,
281
- "save_steps": 25,
282
  "stateful_callbacks": {
283
  "TrainerControl": {
284
  "args": {
@@ -291,7 +201,7 @@
291
  "attributes": {}
292
  }
293
  },
294
- "total_flos": 39501942396.0,
295
  "train_batch_size": 1,
296
  "trial_name": null,
297
  "trial_params": null
 
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
  "epoch": 1.0,
6
+ "eval_steps": 4,
7
+ "global_step": 19,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
+ "epoch": 0.05263157894736842,
14
+ "grad_norm": 0.374282568693161,
15
+ "learning_rate": 0.0001,
16
+ "loss": 5.411169052124023,
17
+ "step": 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  },
19
  {
20
+ "epoch": 0.10526315789473684,
21
+ "grad_norm": 0.4834381341934204,
22
+ "learning_rate": 9.931806517013612e-05,
23
+ "loss": 5.413699626922607,
24
+ "step": 2
25
  },
26
  {
27
+ "epoch": 0.15789473684210525,
28
+ "grad_norm": 1.9866410493850708,
29
+ "learning_rate": 9.729086208503174e-05,
30
+ "loss": 5.407270431518555,
31
+ "step": 3
32
  },
33
  {
34
+ "epoch": 0.21052631578947367,
35
+ "grad_norm": 0.13184547424316406,
36
+ "learning_rate": 9.397368756032445e-05,
37
+ "loss": 5.411037921905518,
38
+ "step": 4
 
39
  },
40
  {
41
+ "epoch": 0.21052631578947367,
42
+ "eval_loss": 5.415262222290039,
43
+ "eval_runtime": 0.6782,
44
+ "eval_samples_per_second": 11.796,
45
+ "eval_steps_per_second": 11.796,
46
+ "step": 4
47
  },
48
  {
49
+ "epoch": 0.2631578947368421,
50
+ "grad_norm": 0.08807552605867386,
51
+ "learning_rate": 8.945702546981969e-05,
52
+ "loss": 5.413403034210205,
53
+ "step": 5
54
  },
55
  {
56
+ "epoch": 0.3157894736842105,
57
+ "grad_norm": 0.3944004476070404,
58
+ "learning_rate": 8.386407858128706e-05,
59
+ "loss": 5.411047458648682,
60
+ "step": 6
 
61
  },
62
  {
63
+ "epoch": 0.3684210526315789,
64
+ "grad_norm": 0.19243471324443817,
65
+ "learning_rate": 7.734740790612136e-05,
66
+ "loss": 5.410379886627197,
67
+ "step": 7
68
  },
69
  {
70
+ "epoch": 0.42105263157894735,
71
+ "grad_norm": 1.1386080980300903,
72
+ "learning_rate": 7.008477123264848e-05,
73
+ "loss": 5.4107584953308105,
74
+ "step": 8
75
  },
76
  {
77
+ "epoch": 0.42105263157894735,
78
+ "eval_loss": 5.414775848388672,
79
+ "eval_runtime": 0.6096,
80
+ "eval_samples_per_second": 13.123,
81
+ "eval_steps_per_second": 13.123,
82
+ "step": 8
83
  },
84
  {
85
+ "epoch": 0.47368421052631576,
86
+ "grad_norm": 1.593002200126648,
87
+ "learning_rate": 6.227427435703997e-05,
88
+ "loss": 5.406350135803223,
89
+ "step": 9
 
90
  },
91
  {
92
+ "epoch": 0.5263157894736842,
93
+ "grad_norm": 0.2701893746852875,
94
+ "learning_rate": 5.4128967273616625e-05,
95
+ "loss": 5.415624141693115,
96
+ "step": 10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  },
98
  {
99
+ "epoch": 0.5789473684210527,
100
+ "grad_norm": 2.5317883491516113,
101
+ "learning_rate": 4.5871032726383386e-05,
102
+ "loss": 5.4125213623046875,
103
+ "step": 11
104
  },
105
  {
106
+ "epoch": 0.631578947368421,
107
+ "grad_norm": 0.18132847547531128,
108
+ "learning_rate": 3.772572564296005e-05,
109
+ "loss": 5.411061763763428,
110
+ "step": 12
111
  },
112
  {
113
+ "epoch": 0.631578947368421,
114
+ "eval_loss": 5.414316177368164,
115
+ "eval_runtime": 0.6242,
116
+ "eval_samples_per_second": 12.817,
117
+ "eval_steps_per_second": 12.817,
118
+ "step": 12
119
  },
120
  {
121
+ "epoch": 0.6842105263157895,
122
+ "grad_norm": 0.25826773047447205,
123
+ "learning_rate": 2.991522876735154e-05,
124
+ "loss": 5.412132740020752,
125
+ "step": 13
 
126
  },
127
  {
128
+ "epoch": 0.7368421052631579,
129
+ "grad_norm": 0.32883742451667786,
130
+ "learning_rate": 2.2652592093878666e-05,
131
+ "loss": 5.409476280212402,
132
+ "step": 14
133
  },
134
  {
135
+ "epoch": 0.7894736842105263,
136
+ "grad_norm": 0.15471753478050232,
137
+ "learning_rate": 1.6135921418712956e-05,
138
+ "loss": 5.413200855255127,
139
+ "step": 15
140
  },
141
  {
142
+ "epoch": 0.8421052631578947,
143
+ "grad_norm": 0.2990401089191437,
144
+ "learning_rate": 1.0542974530180327e-05,
145
+ "loss": 5.4155683517456055,
146
+ "step": 16
 
147
  },
148
  {
149
+ "epoch": 0.8421052631578947,
150
+ "eval_loss": 5.414139747619629,
151
+ "eval_runtime": 0.7326,
152
+ "eval_samples_per_second": 10.92,
153
+ "eval_steps_per_second": 10.92,
154
+ "step": 16
155
  },
156
  {
157
+ "epoch": 0.8947368421052632,
158
+ "grad_norm": 2.2788937091827393,
159
+ "learning_rate": 6.026312439675552e-06,
160
+ "loss": 5.411947727203369,
161
+ "step": 17
162
  },
163
  {
164
+ "epoch": 0.9473684210526315,
165
+ "grad_norm": 0.20979730784893036,
166
+ "learning_rate": 2.7091379149682685e-06,
167
+ "loss": 5.413649082183838,
168
+ "step": 18
169
  },
170
  {
171
+ "epoch": 1.0,
172
+ "grad_norm": 0.20296302437782288,
173
+ "learning_rate": 6.819348298638839e-07,
174
+ "loss": 5.415382385253906,
175
+ "step": 19
 
176
  },
177
  {
178
  "epoch": 1.0,
179
+ "step": 19,
180
+ "total_flos": 3359425752.0,
181
+ "train_loss": 5.411877933301423,
182
+ "train_runtime": 7.7818,
183
+ "train_samples_per_second": 2.442,
184
+ "train_steps_per_second": 2.442
185
  }
186
  ],
187
+ "logging_steps": 1,
188
+ "max_steps": 19,
189
  "num_input_tokens_seen": 0,
190
  "num_train_epochs": 1,
191
+ "save_steps": 8,
192
  "stateful_callbacks": {
193
  "TrainerControl": {
194
  "args": {
 
201
  "attributes": {}
202
  }
203
  },
204
+ "total_flos": 3359425752.0,
205
  "train_batch_size": 1,
206
  "trial_name": null,
207
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0b40f89f203aa6181d2ecfcc97532f60812689752da3bd88fca13bc6a344c8f6
3
- size 5201
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a142cd858828160b38820775c31c8d19bc13769b29cdfea16c1835758c53a125
3
+ size 5265