Unseen1980 commited on
Commit
4313f27
·
verified ·
1 Parent(s): 656eac8

Upload folder using huggingface_hub

Browse files
Files changed (41) hide show
  1. .gitattributes +2 -0
  2. README.md +309 -0
  3. adapter_config.json +37 -0
  4. adapter_model.safetensors +3 -0
  5. all_results.json +22 -0
  6. config.json +35 -0
  7. eval_results.json +16 -0
  8. evaluation/lm_harness_20250217_104435/generation_params.json +18 -0
  9. evaluation/lm_harness_20250217_104435/model_params.json +19 -0
  10. evaluation/lm_harness_20250217_104435/package_versions.json +656 -0
  11. evaluation/lm_harness_20250217_104435/platform_results.json +11 -0
  12. evaluation/lm_harness_20250217_104435/platform_task_config.json +112 -0
  13. evaluation/lm_harness_20250217_104435/task_params.json +7 -0
  14. generation_config.json +7 -0
  15. instructions_function_calling.md +183 -0
  16. logs/rank_0000.log +349 -0
  17. merges.txt +0 -0
  18. model.safetensors +3 -0
  19. onnx/model.onnx +3 -0
  20. onnx/model.onnx_data +3 -0
  21. onnx/model_bnb4.onnx +3 -0
  22. onnx/model_fp16.onnx +3 -0
  23. onnx/model_fp16.onnx_data +3 -0
  24. onnx/model_int8.onnx +3 -0
  25. onnx/model_q4.onnx +3 -0
  26. onnx/model_q4f16.onnx +3 -0
  27. onnx/model_quantized.onnx +3 -0
  28. onnx/model_uint8.onnx +3 -0
  29. runs/Feb17_09-38-46_7975b2aca0fe/events.out.tfevents.1739785126.7975b2aca0fe.3150.0 +3 -0
  30. runs/Oct31_06-24-59_ip-26-0-174-36/events.out.tfevents.1730356365.ip-26-0-174-36.3169719.0 +3 -0
  31. runs/Oct31_06-24-59_ip-26-0-174-36/events.out.tfevents.1730363825.ip-26-0-174-36.3169719.1 +3 -0
  32. special_tokens_map.json +34 -0
  33. telemetry/devices_info.txt +2 -0
  34. telemetry/training_config.yaml +182 -0
  35. telemetry/world_size.json +4 -0
  36. tokenizer.json +0 -0
  37. tokenizer_config.json +154 -0
  38. train_results.json +9 -0
  39. trainer_state.json +2426 -0
  40. training_args.bin +3 -0
  41. vocab.json +0 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ onnx/model.onnx_data filter=lfs diff=lfs merge=lfs -text
37
+ onnx/model_fp16.onnx_data filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,309 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ language:
5
+ - en
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - safetensors
9
+ - onnx
10
+ - transformers.js
11
+ base_model:
12
+ - HuggingFaceTB/SmolLM2-1.7B
13
+ ---
14
+
15
+
16
+ # SmolLM2
17
+
18
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/y45hIMNREW7w_XpHYB_0q.png)
19
+
20
+ ## Table of Contents
21
+
22
+ 1. [Model Summary](#model-summary)
23
+ 2. [Evaluation](#evaluation)
24
+ 3. [Examples](#examples)
25
+ 4. [Limitations](#limitations)
26
+ 5. [Training](#training)
27
+ 6. [License](#license)
28
+ 7. [Citation](#citation)
29
+
30
+ ## Model Summary
31
+
32
+ SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. More details in our paper: https://arxiv.org/abs/2502.02737v1
33
+
34
+ The 1.7B variant demonstrates significant advances over its predecessor SmolLM1-1.7B, particularly in instruction following, knowledge, reasoning, and mathematics. It was trained on 11 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new mathematics and coding datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets. We then applied Direct Preference Optimization (DPO) using [UltraFeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
35
+
36
+ The instruct model additionally supports tasks such as text rewriting, summarization and function calling thanks to datasets developed by [Argilla](https://huggingface.co/argilla) such as [Synth-APIGen-v0.1](https://huggingface.co/datasets/argilla/Synth-APIGen-v0.1).
37
+ You can find the SFT dataset here: https://huggingface.co/datasets/HuggingFaceTB/smoltalk.
38
+
39
+ For more details refer to: https://github.com/huggingface/smollm. You will find pre-training, post-training, evaluation and local inference code.
40
+
41
+ ### How to use
42
+
43
+ ### Transformers
44
+ ```bash
45
+ pip install transformers
46
+ ```
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+ checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
51
+
52
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
53
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
54
+ # for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
55
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
56
+
57
+ messages = [{"role": "user", "content": "What is the capital of France."}]
58
+ input_text=tokenizer.apply_chat_template(messages, tokenize=False)
59
+ inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
60
+ outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
61
+ print(tokenizer.decode(outputs[0]))
62
+ ```
63
+
64
+
65
+ ### Chat in TRL
66
+ You can also use the TRL CLI to chat with the model from the terminal:
67
+ ```bash
68
+ pip install trl
69
+ trl chat --model_name_or_path HuggingFaceTB/SmolLM2-1.7B-Instruct --device cpu
70
+ ```
71
+
72
+ ## Evaluation
73
+
74
+ In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
75
+
76
+ ## Base Pre-Trained Model
77
+
78
+ | Metric | SmolLM2-1.7B | Llama-1B | Qwen2.5-1.5B | SmolLM1-1.7B |
79
+ |------------------|--------------|-------------|---------------|--------------|
80
+ | HellaSwag | **68.7** | 61.2 | 66.4 | 62.9 |
81
+ | ARC (Average) | **60.5** | 49.2 | 58.5 | 59.9 |
82
+ | PIQA | **77.6** | 74.8 | 76.1 | 76.0 |
83
+ | MMLU-Pro (MCF) | **19.4** | 11.7 | 13.7 | 10.8 |
84
+ | CommonsenseQA | **43.6** | 41.2 | 34.1 | 38.0 |
85
+ | TriviaQA | **36.7** | 28.1 | 20.9 | 22.5 |
86
+ | Winogrande | **59.4** | 57.8 | 59.3 | 54.7 |
87
+ | OpenBookQA | 42.2 | 38.4 | 40.0 | **42.4** |
88
+ | GSM8K (5-shot) | 31.0 | 7.2 | **61.3** | 5.5 |
89
+
90
+ ## Instruction Model
91
+
92
+ | Metric | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct |
93
+ |:-----------------------------|:---------------------:|:-----------------:|:----------------------:|:----------------------:|
94
+ | IFEval (Average prompt/inst) | **56.7** | 53.5 | 47.4 | 23.1 |
95
+ | MT-Bench | 6.13 | 5.48 | **6.52** | 4.33 |
96
+ | OpenRewrite-Eval (micro_avg RougeL) | 44.9 | 39.2 | **46.9** | NaN |
97
+ | HellaSwag | **66.1** | 56.1 | 60.9 | 55.5 |
98
+ | ARC (Average) | **51.7** | 41.6 | 46.2 | 43.7 |
99
+ | PIQA | **74.4** | 72.3 | 73.2 | 71.6 |
100
+ | MMLU-Pro (MCF) | 19.3 | 12.7 | **24.2** | 11.7 |
101
+ | BBH (3-shot) | 32.2 | 27.6 | **35.3** | 25.7 |
102
+ | GSM8K (5-shot) | **48.2** | 26.8 | 42.8 | 4.62 |
103
+
104
+
105
+ ## Examples
106
+ Below are some system and instruct prompts that work well for special tasks
107
+
108
+ ### Text rewriting
109
+
110
+ ```python
111
+ system_prompt_rewrite = "You are an AI writing assistant. Your task is to rewrite the user's email to make it more professional and approachable while maintaining its main points and key message. Do not return any text other than the rewritten message."
112
+ user_prompt_rewrite = "Rewrite the message below to make it more friendly and approachable while maintaining its main points and key message. Do not add any new information or return any text other than the rewritten message\nThe message:"
113
+ messages = [{"role": "system", "content": system_prompt_rewrite}, {"role": "user", "content":f"{user_prompt_rewrite} The CI is failing after your last commit!"}]
114
+ input_text=tokenizer.apply_chat_template(messages, tokenize=False)
115
+ inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
116
+ outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
117
+ print(tokenizer.decode(outputs[0]))
118
+ ```
119
+ ```
120
+ Hey there! I noticed that the CI isn't passing after your latest commit. Could you take a look and let me know what's going on? Thanks so much for your help!
121
+ ```
122
+
123
+ ### Summarization
124
+
125
+ ```python
126
+ system_prompt_summarize = "Provide a concise, objective summary of the input text in up to three sentences, focusing on key actions and intentions without using second or third person pronouns."
127
+ messages = [{"role": "system", "content": system_prompt_summarize}, {"role": "user", "content": INSERT_LONG_EMAIL}]
128
+ input_text=tokenizer.apply_chat_template(messages, tokenize=False)
129
+ inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
130
+ outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
131
+ print(tokenizer.decode(outputs[0]))
132
+ ```
133
+
134
+ ### Function calling
135
+
136
+ SmolLM2-1.7B-Instruct can handle function calling, it scores 27% on the [BFCL Leaderboard](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html). Here's how you can leverage it:
137
+
138
+ ```python
139
+ import json
140
+ import re
141
+ from typing import Optional
142
+
143
+ from jinja2 import Template
144
+ import torch
145
+ from transformers import AutoModelForCausalLM, AutoTokenizer
146
+ from transformers.utils import get_json_schema
147
+
148
+
149
+ system_prompt = Template("""You are an expert in composing functions. You are given a question and a set of possible functions.
150
+ Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
151
+ If none of the functions can be used, point it out and refuse to answer.
152
+ If the given question lacks the parameters required by the function, also point it out.
153
+
154
+ You have access to the following tools:
155
+ <tools>{{ tools }}</tools>
156
+
157
+ The output MUST strictly adhere to the following format, and NO other text MUST be included.
158
+ The example format is as follows. Please make sure the parameter type is correct. If no function call is needed, please make the tool calls an empty list '[]'.
159
+ <tool_call>[
160
+ {"name": "func_name1", "arguments": {"argument1": "value1", "argument2": "value2"}},
161
+ ... (more tool calls as required)
162
+ ]</tool_call>""")
163
+
164
+
165
+ def prepare_messages(
166
+ query: str,
167
+ tools: Optional[dict[str, any]] = None,
168
+ history: Optional[list[dict[str, str]]] = None
169
+ ) -> list[dict[str, str]]:
170
+ """Prepare the system and user messages for the given query and tools.
171
+
172
+ Args:
173
+ query: The query to be answered.
174
+ tools: The tools available to the user. Defaults to None, in which case if a
175
+ list without content will be passed to the model.
176
+ history: Exchange of messages, including the system_prompt from
177
+ the first query. Defaults to None, the first message in a conversation.
178
+ """
179
+ if tools is None:
180
+ tools = []
181
+ if history:
182
+ messages = history.copy()
183
+ messages.append({"role": "user", "content": query})
184
+ else:
185
+ messages = [
186
+ {"role": "system", "content": system_prompt.render(tools=json.dumps(tools))},
187
+ {"role": "user", "content": query}
188
+ ]
189
+ return messages
190
+
191
+
192
+ def parse_response(text: str) -> str | dict[str, any]:
193
+ """Parses a response from the model, returning either the
194
+ parsed list with the tool calls parsed, or the
195
+ model thought or response if couldn't generate one.
196
+
197
+ Args:
198
+ text: Response from the model.
199
+ """
200
+ pattern = r"<tool_call>(.*?)</tool_call>"
201
+ matches = re.findall(pattern, text, re.DOTALL)
202
+ if matches:
203
+ return json.loads(matches[0])
204
+ return text
205
+
206
+
207
+ model_name_smollm = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
208
+ model = AutoModelForCausalLM.from_pretrained(model_name_smollm, device_map="auto", torch_dtype="auto", trust_remote_code=True)
209
+ tokenizer = AutoTokenizer.from_pretrained(model_name_smollm)
210
+
211
+ from datetime import datetime
212
+ import random
213
+
214
+ def get_current_time() -> str:
215
+ """Returns the current time in 24-hour format.
216
+
217
+ Returns:
218
+ str: Current time in HH:MM:SS format.
219
+ """
220
+ return datetime.now().strftime("%H:%M:%S")
221
+
222
+
223
+ def get_random_number_between(min: int, max: int) -> int:
224
+ """
225
+ Gets a random number between min and max.
226
+
227
+ Args:
228
+ min: The minimum number.
229
+ max: The maximum number.
230
+
231
+ Returns:
232
+ A random number between min and max.
233
+ """
234
+ return random.randint(min, max)
235
+
236
+
237
+ tools = [get_json_schema(get_random_number_between), get_json_schema(get_current_time)]
238
+
239
+ toolbox = {"get_random_number_between": get_random_number_between, "get_current_time": get_current_time}
240
+
241
+ query = "Give me a number between 1 and 300"
242
+
243
+ messages = prepare_messages(query, tools=tools)
244
+
245
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
246
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
247
+ result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
248
+
249
+ tool_calls = parse_response(result)
250
+ # [{'name': 'get_random_number_between', 'arguments': {'min': 1, 'max': 300}}
251
+
252
+ # Get tool responses
253
+ tool_responses = [toolbox.get(tc["name"])(*tc["arguments"].values()) for tc in tool_calls]
254
+ # [63]
255
+
256
+ # For the second turn, rebuild the history of messages:
257
+ history = messages.copy()
258
+ # Add the "parsed response"
259
+ history.append({"role": "assistant", "content": result})
260
+ query = "Can you give me the hour?"
261
+ history.append({"role": "user", "content": query})
262
+
263
+ inputs = tokenizer.apply_chat_template(history, add_generation_prompt=True, return_tensors="pt").to(model.device)
264
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
265
+ result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
266
+
267
+ tool_calls = parse_response(result)
268
+ tool_responses = [toolbox.get(tc["name"])(*tc["arguments"].values()) for tc in tool_calls]
269
+ # ['07:57:25']
270
+ ```
271
+ More details such as parallel function calls and tools not available can be found [here](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct/blob/main/instructions_function_calling.md)
272
+
273
+ ## Limitations
274
+
275
+ SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.
276
+
277
+ ## Training
278
+
279
+ ### Model
280
+
281
+ - **Architecture:** Transformer decoder
282
+ - **Pretraining tokens:** 11T
283
+ - **Precision:** bfloat16
284
+
285
+ ### Hardware
286
+
287
+ - **GPUs:** 256 H100
288
+
289
+ ### Software
290
+
291
+ - **Training Framework:** [nanotron](https://github.com/huggingface/nanotron/tree/main)
292
+ - **Alignment Handbook** [alignment-handbook](https://github.com/huggingface/alignment-handbook/)
293
+
294
+ ## License
295
+
296
+ [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
297
+
298
+ ## Citation
299
+ ```bash
300
+ @misc{allal2025smollm2smolgoesbig,
301
+ title={SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model},
302
+ author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Guilherme Penedo and Lewis Tunstall and Andrés Marafioti and Hynek Kydlíček and Agustín Piqueres Lajarín and Vaibhav Srivastav and Joshua Lochner and Caleb Fahlgren and Xuan-Son Nguyen and Clémentine Fourrier and Ben Burtenshaw and Hugo Larcher and Haojun Zhao and Cyril Zakka and Mathieu Morlon and Colin Raffel and Leandro von Werra and Thomas Wolf},
303
+ year={2025},
304
+ eprint={2502.02737},
305
+ archivePrefix={arXiv},
306
+ primaryClass={cs.CL},
307
+ url={https://arxiv.org/abs/2502.02737},
308
+ }
309
+ ```
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "HuggingFaceTB/SmolLM2-1.7B-Instruct",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.0,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 16,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "v_proj",
27
+ "down_proj",
28
+ "k_proj",
29
+ "o_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "q_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": false
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4474c6c01dc77654496a92c12d32d32de2dcad9c9c46f7c5ddbb4492d1bddba1
3
+ size 72396376
all_results.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.996074326092646,
3
+ "eval_logits/chosen": -0.34099623560905457,
4
+ "eval_logits/rejected": -0.3685227334499359,
5
+ "eval_logps/chosen": -310.2510070800781,
6
+ "eval_logps/rejected": -275.43145751953125,
7
+ "eval_loss": 0.587827205657959,
8
+ "eval_rewards/accuracies": 0.6746031641960144,
9
+ "eval_rewards/chosen": 0.01673175022006035,
10
+ "eval_rewards/margins": 0.5906793475151062,
11
+ "eval_rewards/rejected": -0.573947548866272,
12
+ "eval_runtime": 18.8462,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 106.122,
15
+ "eval_steps_per_second": 3.343,
16
+ "total_flos": 0.0,
17
+ "train_loss": 0.5334697115221363,
18
+ "train_runtime": 7355.3343,
19
+ "train_samples": 61134,
20
+ "train_samples_per_second": 24.935,
21
+ "train_steps_per_second": 0.195
22
+ }
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 2048,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 8192,
13
+ "max_position_embeddings": 8192,
14
+ "mlp_bias": false,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 24,
18
+ "num_key_value_heads": 32,
19
+ "pad_token_id": 2,
20
+ "pretraining_tp": 1,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_scaling": null,
23
+ "rope_theta": 130000,
24
+ "tie_word_embeddings": true,
25
+ "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.42.3",
27
+ "transformers.js_config": {
28
+ "kv_cache_dtype": {
29
+ "q4f16": "float16",
30
+ "fp16": "float16"
31
+ }
32
+ },
33
+ "use_cache": true,
34
+ "vocab_size": 49152
35
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.996074326092646,
3
+ "eval_logits/chosen": -0.34099623560905457,
4
+ "eval_logits/rejected": -0.3685227334499359,
5
+ "eval_logps/chosen": -310.2510070800781,
6
+ "eval_logps/rejected": -275.43145751953125,
7
+ "eval_loss": 0.587827205657959,
8
+ "eval_rewards/accuracies": 0.6746031641960144,
9
+ "eval_rewards/chosen": 0.01673175022006035,
10
+ "eval_rewards/margins": 0.5906793475151062,
11
+ "eval_rewards/rejected": -0.573947548866272,
12
+ "eval_runtime": 18.8462,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 106.122,
15
+ "eval_steps_per_second": 3.343
16
+ }
evaluation/lm_harness_20250217_104435/generation_params.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "max_new_tokens": 256,
3
+ "batch_size": null,
4
+ "exclude_prompt_from_response": true,
5
+ "seed": null,
6
+ "temperature": 0.0,
7
+ "top_p": 1.0,
8
+ "frequency_penalty": 0.0,
9
+ "presence_penalty": 0.0,
10
+ "stop_strings": null,
11
+ "stop_token_ids": null,
12
+ "logit_bias": {},
13
+ "min_p": 0.0,
14
+ "use_cache": false,
15
+ "num_beams": 1,
16
+ "use_sampling": false,
17
+ "guided_decoding": null
18
+ }
evaluation/lm_harness_20250217_104435/model_params.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "HuggingFaceTB/SmolLM2-1.7B-Instruct",
3
+ "adapter_model": "finetuning_tutorial/output",
4
+ "tokenizer_name": null,
5
+ "tokenizer_pad_token": null,
6
+ "tokenizer_kwargs": {},
7
+ "model_max_length": null,
8
+ "load_pretrained_weights": true,
9
+ "trust_remote_code": false,
10
+ "torch_dtype_str": "bfloat16",
11
+ "compile": false,
12
+ "chat_template": null,
13
+ "attn_implementation": null,
14
+ "device_map": "auto",
15
+ "model_kwargs": {},
16
+ "enable_liger_kernel": false,
17
+ "shard_for_eval": false,
18
+ "freeze_layers": []
19
+ }
evaluation/lm_harness_20250217_104435/package_versions.json ADDED
@@ -0,0 +1,656 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "colorama": "0.4.6",
3
+ "setproctitle": "1.3.4",
4
+ "psutil": "5.9.5",
5
+ "nvidia-ml-py": "12.560.30",
6
+ "word2number": "1.1",
7
+ "Pillow": "9.5.0",
8
+ "skypilot": "0.7.0",
9
+ "partial-json-parser": "0.2.1.1.post5",
10
+ "pyairports": "2.1.1",
11
+ "tqdm-multiprocess": "0.0.11",
12
+ "triton": "3.0.0",
13
+ "mistral_common": "1.5.3",
14
+ "mbstrdecoder": "1.1.4",
15
+ "DataProperty": "1.1.0",
16
+ "accelerate": "1.2.1",
17
+ "tyro": "0.9.14",
18
+ "uvicorn": "0.34.0",
19
+ "jsonlines": "4.0.0",
20
+ "msgspec": "0.19.0",
21
+ "ray": "2.42.1",
22
+ "uv": "0.6.0",
23
+ "pytablewriter": "1.2.1",
24
+ "bitsandbytes": "0.45.2",
25
+ "nvidia-cuda-nvrtc-cu12": "12.1.105",
26
+ "responses": "0.25.6",
27
+ "nvidia-cuda-runtime-cu12": "12.1.105",
28
+ "outlines": "0.0.46",
29
+ "nvidia-cuda-cupti-cu12": "12.1.105",
30
+ "lm_eval": "0.4.7",
31
+ "pendulum": "3.0.0",
32
+ "httptools": "0.6.4",
33
+ "tokenizers": "0.20.3",
34
+ "nvidia-cusparse-cu12": "12.1.0.106",
35
+ "tabledata": "1.3.4",
36
+ "diskcache": "5.6.3",
37
+ "pexpect": "4.8.0",
38
+ "trl": "0.11.4",
39
+ "xxhash": "3.5.0",
40
+ "pydantic": "2.9.2",
41
+ "typepy": "1.3.4",
42
+ "torchdata": "0.9.0",
43
+ "torchvision": "0.19.0",
44
+ "lm-format-enforcer": "0.10.6",
45
+ "uvloop": "0.21.0",
46
+ "xformers": "0.0.27.post2",
47
+ "evaluate": "0.4.3",
48
+ "sacrebleu": "2.5.1",
49
+ "multiprocess": "0.70.16",
50
+ "watchfiles": "1.0.4",
51
+ "sqlitedict": "2.1.0",
52
+ "vllm": "0.6.3.post1",
53
+ "PuLP": "2.9.0",
54
+ "tiktoken": "0.9.0",
55
+ "antlr4-python3-runtime": "4.9.3",
56
+ "transformers": "4.45.2",
57
+ "time-machine": "2.16.0",
58
+ "oumi": "0.1.3",
59
+ "fastapi": "0.115.8",
60
+ "pybind11": "2.13.6",
61
+ "aioresponses": "0.7.8",
62
+ "starlette": "0.45.3",
63
+ "lark": "1.2.2",
64
+ "dill": "0.3.8",
65
+ "aiofiles": "24.1.0",
66
+ "nvidia-nvtx-cu12": "12.1.105",
67
+ "prometheus-fastapi-instrumentator": "7.0.2",
68
+ "pydantic_core": "2.23.4",
69
+ "torch": "2.4.0",
70
+ "compressed-tensors": "0.6.0",
71
+ "rouge_score": "0.1.2",
72
+ "datasets": "3.2.0",
73
+ "liger_kernel": "0.3.1",
74
+ "nvidia-curand-cu12": "10.3.2.106",
75
+ "pathvalidate": "3.2.3",
76
+ "python-dotenv": "1.0.1",
77
+ "nvidia-nccl-cu12": "2.20.5",
78
+ "shtab": "1.7.1",
79
+ "omegaconf": "2.3.0",
80
+ "pycountry": "24.6.1",
81
+ "nvidia-cusolver-cu12": "11.4.5.107",
82
+ "gguf": "0.10.0",
83
+ "nvidia-cufft-cu12": "11.0.2.54",
84
+ "nvidia-cudnn-cu12": "9.1.0.70",
85
+ "tcolorpy": "0.1.7",
86
+ "fsspec": "2024.9.0",
87
+ "nvidia-cublas-cu12": "12.1.3.1",
88
+ "portalocker": "3.1.1",
89
+ "interegular": "0.3.3",
90
+ "google-colab": "1.0.0",
91
+ "nvidia-cuda-nvcc-cu12": "12.5.82",
92
+ "google-pasta": "0.2.0",
93
+ "psycopg2": "2.9.10",
94
+ "bigquery-magics": "0.5.0",
95
+ "referencing": "0.36.2",
96
+ "text-unidecode": "1.3",
97
+ "aiohttp": "3.11.12",
98
+ "arviz": "0.20.0",
99
+ "Markdown": "3.7",
100
+ "Sphinx": "8.1.3",
101
+ "gym": "0.25.2",
102
+ "altair": "5.5.0",
103
+ "opt_einsum": "3.4.0",
104
+ "datascience": "0.17.6",
105
+ "geographiclib": "2.0",
106
+ "nest-asyncio": "1.6.0",
107
+ "orjson": "3.10.15",
108
+ "imageio": "2.37.0",
109
+ "sigstore-protobuf-specs": "0.3.2",
110
+ "mistune": "3.1.1",
111
+ "flatbuffers": "25.2.10",
112
+ "PyJWT": "2.3.0",
113
+ "logical-unification": "0.4.6",
114
+ "lxml": "5.3.1",
115
+ "google-cloud-core": "2.4.1",
116
+ "fastai": "2.7.18",
117
+ "pylibcugraph-cu12": "24.12.0",
118
+ "google-ai-generativelanguage": "0.6.15",
119
+ "email_validator": "2.2.0",
120
+ "scs": "3.2.7.post2",
121
+ "docker-pycreds": "0.4.0",
122
+ "dnspython": "2.7.0",
123
+ "cmake": "3.31.4",
124
+ "thinc": "8.2.5",
125
+ "fastrlock": "0.8.3",
126
+ "immutabledict": "4.2.1",
127
+ "entrypoints": "0.4",
128
+ "rpy2": "3.4.2",
129
+ "huggingface-hub": "0.28.1",
130
+ "duckdb": "1.1.3",
131
+ "pandas-stubs": "2.2.2.240909",
132
+ "toolz": "0.12.1",
133
+ "jax-cuda12-plugin": "0.4.33",
134
+ "nibabel": "5.3.2",
135
+ "sphinxcontrib-applehelp": "2.0.0",
136
+ "pynvjitlink-cu12": "0.5.0",
137
+ "babel": "2.17.0",
138
+ "future": "1.0.0",
139
+ "plotnine": "0.14.5",
140
+ "matplotlib-inline": "0.1.7",
141
+ "srsly": "2.5.1",
142
+ "textblob": "0.19.0",
143
+ "pyperclip": "1.9.0",
144
+ "httplib2": "0.20.2",
145
+ "matplotlib": "3.10.0",
146
+ "xlrd": "2.0.1",
147
+ "jupyter-server": "1.24.0",
148
+ "google-cloud-bigquery-storage": "2.28.0",
149
+ "pytensor": "2.27.1",
150
+ "sphinxcontrib-qthelp": "2.0.0",
151
+ "typer": "0.15.1",
152
+ "tensorboard-data-server": "0.7.2",
153
+ "google-cloud-datastore": "2.20.2",
154
+ "plotly": "5.24.1",
155
+ "ipython-sql": "0.5.0",
156
+ "etuples": "0.3.9",
157
+ "PyDrive2": "1.21.3",
158
+ "shap": "0.46.0",
159
+ "google-auth-httplib2": "0.2.0",
160
+ "spacy-loggers": "1.0.5",
161
+ "cachetools": "5.5.1",
162
+ "platformdirs": "4.2.2",
163
+ "MarkupSafe": "3.0.2",
164
+ "wrapt": "1.17.2",
165
+ "docstring_parser": "0.16",
166
+ "sentry-sdk": "2.21.0",
167
+ "kaggle": "1.6.17",
168
+ "highspy": "1.9.0",
169
+ "cloudpickle": "3.1.1",
170
+ "jsonpickle": "4.0.1",
171
+ "pygit2": "1.17.0",
172
+ "python-dateutil": "2.8.2",
173
+ "python-louvain": "0.16",
174
+ "opencv-python-headless": "4.11.0.86",
175
+ "namex": "0.0.8",
176
+ "folium": "0.19.4",
177
+ "Pygments": "2.18.0",
178
+ "ptyprocess": "0.7.0",
179
+ "rpds-py": "0.22.3",
180
+ "optax": "0.2.4",
181
+ "joblib": "1.4.2",
182
+ "pyparsing": "2.4.7",
183
+ "python-utils": "3.9.1",
184
+ "astropy": "7.0.1",
185
+ "decorator": "4.4.2",
186
+ "multidict": "6.1.0",
187
+ "urllib3": "2.3.0",
188
+ "orbax-checkpoint": "0.6.4",
189
+ "python-slugify": "8.0.4",
190
+ "google-auth": "2.27.0",
191
+ "google-cloud-bigquery-connection": "1.17.0",
192
+ "multitasking": "0.0.11",
193
+ "statsmodels": "0.14.4",
194
+ "GDAL": "3.6.4",
195
+ "zipp": "3.19.2",
196
+ "cmdstanpy": "1.2.5",
197
+ "regex": "2024.11.6",
198
+ "preshed": "3.0.9",
199
+ "Send2Trash": "1.8.3",
200
+ "catalogue": "2.0.10",
201
+ "xgboost": "2.1.4",
202
+ "CacheControl": "0.14.2",
203
+ "prompt_toolkit": "3.0.50",
204
+ "opencv-contrib-python": "4.11.0.86",
205
+ "stringzilla": "3.11.3",
206
+ "cycler": "0.12.1",
207
+ "absl-py": "1.4.0",
208
+ "music21": "9.3.0",
209
+ "nvidia-nvjitlink-cu12": "12.5.82",
210
+ "tensorflow-hub": "0.16.1",
211
+ "argon2-cffi": "23.1.0",
212
+ "keras": "3.8.0",
213
+ "tuf": "5.1.0",
214
+ "wasabi": "1.1.3",
215
+ "geemap": "0.35.1",
216
+ "google-api-core": "2.19.2",
217
+ "keras-nlp": "0.18.1",
218
+ "networkx": "3.4.2",
219
+ "grpcio-status": "1.62.3",
220
+ "simsimd": "6.2.1",
221
+ "numexpr": "2.10.2",
222
+ "snowballstemmer": "2.2.0",
223
+ "tcmlib": "1.2.0",
224
+ "opencv-python": "4.11.0.86",
225
+ "Pyomo": "6.8.2",
226
+ "scikit-learn": "1.6.1",
227
+ "pandas-gbq": "0.26.1",
228
+ "dlib": "19.24.2",
229
+ "intel-cmplr-lib-ur": "2025.0.4",
230
+ "langchain-core": "0.3.35",
231
+ "smart-open": "7.1.0",
232
+ "sphinxcontrib-devhelp": "2.0.0",
233
+ "importlib_metadata": "8.0.0",
234
+ "pyzmq": "24.0.1",
235
+ "tensorflow-datasets": "4.9.7",
236
+ "pathlib": "1.0.1",
237
+ "flax": "0.10.3",
238
+ "astropy-iers-data": "0.2025.2.10.0.33.26",
239
+ "jupyter-leaflet": "0.19.2",
240
+ "en-core-web-sm": "3.7.1",
241
+ "click": "8.1.8",
242
+ "jupyterlab_pygments": "0.3.0",
243
+ "sentencepiece": "0.2.0",
244
+ "optree": "0.14.0",
245
+ "termcolor": "2.5.0",
246
+ "gcsfs": "2024.10.0",
247
+ "kiwisolver": "1.4.8",
248
+ "torchaudio": "2.5.1+cu124",
249
+ "pluggy": "1.5.0",
250
+ "smmap": "5.0.2",
251
+ "rfc8785": "0.1.4",
252
+ "pyasn1": "0.6.1",
253
+ "six": "1.16.0",
254
+ "sphinxcontrib-serializinghtml": "2.0.0",
255
+ "imagesize": "1.4.1",
256
+ "simple-parsing": "0.1.7",
257
+ "opentelemetry-sdk": "1.16.0",
258
+ "distro": "1.7.0",
259
+ "ply": "3.11",
260
+ "PySocks": "1.7.1",
261
+ "graphviz": "0.20.3",
262
+ "fastprogress": "1.0.3",
263
+ "autograd": "1.7.0",
264
+ "diffusers": "0.32.2",
265
+ "langsmith": "0.3.8",
266
+ "soupsieve": "2.6",
267
+ "grpc-interceptor": "0.15.4",
268
+ "greenlet": "3.1.1",
269
+ "gymnasium": "1.0.0",
270
+ "python-box": "7.3.2",
271
+ "httpcore": "1.0.7",
272
+ "googledrivedownloader": "1.1.0",
273
+ "branca": "0.8.1",
274
+ "itsdangerous": "2.2.0",
275
+ "grpc-google-iam-v1": "0.14.0",
276
+ "cons": "0.4.6",
277
+ "opentelemetry-semantic-conventions": "0.37b0",
278
+ "aiohappyeyeballs": "2.4.6",
279
+ "requests-oauthlib": "2.0.0",
280
+ "jieba": "0.42.1",
281
+ "tensorboard": "2.18.0",
282
+ "bleach": "6.2.0",
283
+ "umf": "0.9.1",
284
+ "tenacity": "9.0.0",
285
+ "defusedxml": "0.7.1",
286
+ "fastjsonschema": "2.21.1",
287
+ "pylibcudf-cu12": "24.12.0",
288
+ "markdown-it-py": "3.0.0",
289
+ "et_xmlfile": "2.0.0",
290
+ "typing_extensions": "4.12.2",
291
+ "patsy": "1.0.1",
292
+ "gast": "0.6.0",
293
+ "cffi": "1.17.1",
294
+ "ipyleaflet": "0.19.2",
295
+ "imbalanced-learn": "0.13.0",
296
+ "jupyterlab_widgets": "3.0.13",
297
+ "PyDrive": "1.3.1",
298
+ "intel-openmp": "2025.0.4",
299
+ "albumentations": "2.0.4",
300
+ "google-genai": "0.8.0",
301
+ "locket": "1.0.0",
302
+ "cryptography": "3.4.8",
303
+ "oauthlib": "3.2.0",
304
+ "panel": "1.6.0",
305
+ "safetensors": "0.5.2",
306
+ "iniconfig": "2.0.0",
307
+ "imageio-ffmpeg": "0.6.0",
308
+ "libclang": "18.1.1",
309
+ "dask": "2024.10.0",
310
+ "nbclient": "0.10.2",
311
+ "threadpoolctl": "3.5.0",
312
+ "protobuf": "4.25.6",
313
+ "google": "2.0.3",
314
+ "googleapis-common-protos": "1.66.0",
315
+ "colorlover": "0.3.0",
316
+ "pandas-datareader": "0.10.0",
317
+ "moviepy": "1.0.3",
318
+ "sklearn-pandas": "2.2.0",
319
+ "webencodings": "0.5.1",
320
+ "soundfile": "0.13.1",
321
+ "pyshp": "2.3.1",
322
+ "mpmath": "1.3.0",
323
+ "ml-dtypes": "0.4.1",
324
+ "tinycss2": "1.4.0",
325
+ "geopandas": "1.0.1",
326
+ "spacy-legacy": "3.0.12",
327
+ "tzdata": "2025.1",
328
+ "gin-config": "0.5.0",
329
+ "mkl": "2025.0.1",
330
+ "ipython": "7.34.0",
331
+ "types-pytz": "2025.1.0.20250204",
332
+ "promise": "2.3",
333
+ "grpcio": "1.70.0",
334
+ "chex": "0.1.88",
335
+ "miniKanren": "1.0.3",
336
+ "weasel": "0.4.1",
337
+ "pyOpenSSL": "24.2.1",
338
+ "google-crc32c": "1.6.0",
339
+ "sniffio": "1.3.1",
340
+ "nvidia-nvcomp-cu12": "4.1.0.6",
341
+ "uritemplate": "4.1.1",
342
+ "jsonpointer": "3.0.0",
343
+ "ipyfilechooser": "0.6.0",
344
+ "timm": "1.0.14",
345
+ "google-cloud-pubsub": "2.25.0",
346
+ "importlib_resources": "6.4.0",
347
+ "holoviews": "1.20.0",
348
+ "python-snappy": "0.7.3",
349
+ "html5lib": "1.1",
350
+ "scooby": "0.10.0",
351
+ "google-cloud-firestore": "2.20.0",
352
+ "ndindex": "1.9.2",
353
+ "pyviz_comms": "3.0.4",
354
+ "certifi": "2025.1.31",
355
+ "pydotplus": "2.0.2",
356
+ "google-api-python-client": "2.160.0",
357
+ "charset-normalizer": "3.4.1",
358
+ "google-spark-connect": "0.5.2",
359
+ "prophet": "1.1.6",
360
+ "pylibraft-cu12": "24.12.0",
361
+ "pickleshare": "0.7.5",
362
+ "qdldl": "0.1.7.post5",
363
+ "google-cloud-spanner": "3.51.0",
364
+ "numpy": "1.26.4",
365
+ "progressbar2": "4.5.0",
366
+ "pygame": "2.6.1",
367
+ "openpyxl": "3.1.5",
368
+ "cupy-cuda12x": "13.3.0",
369
+ "dm-tree": "0.1.9",
370
+ "parsy": "2.1",
371
+ "wordcloud": "1.9.4",
372
+ "pymc": "5.20.1",
373
+ "google-cloud-dataproc": "5.16.0",
374
+ "kagglehub": "0.3.7",
375
+ "keras-hub": "0.18.1",
376
+ "pydata-google-auth": "1.9.1",
377
+ "atpublic": "4.1.0",
378
+ "propcache": "0.2.1",
379
+ "zstandard": "0.23.0",
380
+ "prettytable": "3.14.0",
381
+ "param": "2.2.0",
382
+ "google-cloud-functions": "1.19.0",
383
+ "rsa": "4.9",
384
+ "webcolors": "24.11.1",
385
+ "google-cloud-storage": "2.19.0",
386
+ "llvmlite": "0.44.0",
387
+ "attrs": "25.1.0",
388
+ "frozenlist": "1.5.0",
389
+ "pyproj": "3.7.0",
390
+ "pyspark": "3.5.4",
391
+ "jsonschema-specifications": "2024.10.1",
392
+ "jax": "0.4.33",
393
+ "tf-slim": "1.1.0",
394
+ "langcodes": "3.5.0",
395
+ "tensorflow-probability": "0.25.0",
396
+ "pycocotools": "2.0.8",
397
+ "geopy": "2.4.1",
398
+ "polars": "1.9.0",
399
+ "torchsummary": "1.5.1",
400
+ "mlxtend": "0.23.4",
401
+ "contourpy": "1.3.1",
402
+ "firebase-admin": "6.6.0",
403
+ "eerepr": "0.1.0",
404
+ "jupyter_core": "5.7.2",
405
+ "fonttools": "4.56.0",
406
+ "imgaug": "0.4.0",
407
+ "yarl": "1.18.3",
408
+ "traitlets": "5.7.1",
409
+ "opentelemetry-api": "1.16.0",
410
+ "grpclib": "0.4.7",
411
+ "websocket-client": "1.8.0",
412
+ "py-cpuinfo": "9.0.0",
413
+ "numba-cuda": "0.0.17.1",
414
+ "murmurhash": "1.0.12",
415
+ "openai": "1.61.1",
416
+ "easydict": "1.13",
417
+ "sklearn-compat": "0.1.3",
418
+ "notebook_shim": "0.2.4",
419
+ "portpicker": "1.5.2",
420
+ "gitdb": "4.0.12",
421
+ "py4j": "0.10.9.7",
422
+ "xarray": "2025.1.2",
423
+ "hpack": "4.1.0",
424
+ "cvxpy": "1.6.0",
425
+ "libkvikio-cu12": "24.12.1",
426
+ "requests-toolbelt": "1.0.0",
427
+ "stanio": "0.5.1",
428
+ "lazy_loader": "0.4",
429
+ "jupyter-client": "6.1.12",
430
+ "prometheus_client": "0.21.1",
431
+ "wheel": "0.43.0",
432
+ "community": "1.0.0b1",
433
+ "hyperopt": "0.2.7",
434
+ "mdit-py-plugins": "0.4.2",
435
+ "tensorflow-text": "2.18.1",
436
+ "h5py": "3.12.1",
437
+ "id": "1.5.0",
438
+ "requests": "2.32.3",
439
+ "chardet": "5.2.0",
440
+ "yellowbrick": "1.5",
441
+ "partd": "1.4.2",
442
+ "language_data": "1.3.0",
443
+ "sigstore-rekor-types": "0.0.18",
444
+ "einops": "0.8.1",
445
+ "tensorstore": "0.1.71",
446
+ "google-cloud-aiplatform": "1.79.0",
447
+ "GitPython": "3.1.44",
448
+ "pandas": "2.2.2",
449
+ "humanize": "4.11.0",
450
+ "anyio": "3.7.1",
451
+ "google-auth-oauthlib": "1.2.1",
452
+ "uc-micro-py": "1.0.3",
453
+ "matplotlib-venn": "1.1.1",
454
+ "httpx": "0.28.1",
455
+ "sentence-transformers": "3.4.1",
456
+ "spanner-graph-notebook": "1.0.9",
457
+ "ipyevents": "2.0.2",
458
+ "langchain-text-splitters": "0.3.6",
459
+ "jaxlib": "0.4.33",
460
+ "bigframes": "1.36.0",
461
+ "tensorflow-metadata": "1.16.1",
462
+ "ipytree": "0.2.2",
463
+ "wcwidth": "0.2.13",
464
+ "argon2-cffi-bindings": "21.2.0",
465
+ "nvtx": "0.2.10",
466
+ "colour": "0.1.5",
467
+ "editdistance": "0.8.1",
468
+ "natsort": "8.4.0",
469
+ "Jinja2": "3.1.5",
470
+ "ibis-framework": "9.2.0",
471
+ "alabaster": "1.0.0",
472
+ "betterproto": "2.0.0b6",
473
+ "linkify-it-py": "2.0.3",
474
+ "typeguard": "4.3.0",
475
+ "google-cloud-bigquery": "3.25.0",
476
+ "nltk": "3.9.1",
477
+ "shapely": "2.0.7",
478
+ "blosc2": "3.0.0",
479
+ "peewee": "3.17.9",
480
+ "nbconvert": "7.16.6",
481
+ "glob2": "0.7",
482
+ "ipywidgets": "7.7.1",
483
+ "h11": "0.14.0",
484
+ "Deprecated": "1.2.18",
485
+ "pyarrow": "17.0.0",
486
+ "rich": "13.9.4",
487
+ "toml": "0.10.2",
488
+ "marisa-trie": "1.2.1",
489
+ "packaging": "24.1",
490
+ "parso": "0.8.4",
491
+ "wandb": "0.19.6",
492
+ "missingno": "0.5.2",
493
+ "etils": "1.12.0",
494
+ "langchain": "0.3.18",
495
+ "pymystem3": "0.2.0",
496
+ "gspread-dataframe": "4.0.0",
497
+ "mizani": "0.13.1",
498
+ "earthengine-api": "1.5.2",
499
+ "bokeh": "3.6.3",
500
+ "idna": "3.10",
501
+ "gensim": "4.3.3",
502
+ "cramjam": "2.9.1",
503
+ "aiosignal": "1.3.2",
504
+ "narwhals": "1.26.0",
505
+ "cymem": "2.0.11",
506
+ "cvxopt": "1.3.2",
507
+ "tensorflow-io-gcs-filesystem": "0.37.1",
508
+ "sympy": "1.13.1",
509
+ "blis": "0.7.11",
510
+ "tifffile": "2025.1.10",
511
+ "hyperframe": "6.1.0",
512
+ "pycparser": "2.22",
513
+ "SQLAlchemy": "2.0.38",
514
+ "db-dtypes": "1.4.1",
515
+ "tbb": "2022.0.0",
516
+ "pytz": "2025.1",
517
+ "geocoder": "1.38.1",
518
+ "Cython": "3.0.12",
519
+ "jsonpatch": "1.33",
520
+ "astunparse": "1.6.3",
521
+ "pyogrio": "0.10.0",
522
+ "inflect": "7.3.1",
523
+ "tweepy": "4.15.0",
524
+ "jiter": "0.8.2",
525
+ "multipledispatch": "1.0.0",
526
+ "google-cloud-resource-manager": "1.14.0",
527
+ "notebook": "6.5.5",
528
+ "dopamine_rl": "4.1.2",
529
+ "librosa": "0.10.2.post1",
530
+ "slicer": "0.0.8",
531
+ "sqlglot": "25.6.1",
532
+ "ipyparallel": "8.8.0",
533
+ "tornado": "6.4.2",
534
+ "cyipopt": "1.5.0",
535
+ "more-itertools": "10.3.0",
536
+ "holidays": "0.66",
537
+ "tensorflow": "2.18.0",
538
+ "proto-plus": "1.26.0",
539
+ "albucore": "0.0.23",
540
+ "terminado": "0.18.1",
541
+ "jupyter-console": "6.1.0",
542
+ "google-cloud-iam": "2.17.0",
543
+ "docutils": "0.21.2",
544
+ "lightgbm": "4.5.0",
545
+ "cudf-cu12": "24.12.0",
546
+ "debugpy": "1.8.0",
547
+ "pyerfa": "2.0.1.5",
548
+ "nx-cugraph-cu12": "24.12.0",
549
+ "clarabel": "0.10.0",
550
+ "tabulate": "0.9.0",
551
+ "rmm-cu12": "24.12.1",
552
+ "peft": "0.14.0",
553
+ "fastcore": "1.7.29",
554
+ "jsonschema": "4.23.0",
555
+ "backcall": "0.2.0",
556
+ "pytest": "8.3.4",
557
+ "Werkzeug": "3.1.3",
558
+ "model-signing": "0.2.0",
559
+ "blinker": "1.4",
560
+ "imutils": "0.5.4",
561
+ "audioread": "3.0.1",
562
+ "traittypes": "0.2.1",
563
+ "scipy": "1.13.1",
564
+ "google-cloud-translate": "3.19.0",
565
+ "gym-notices": "0.0.8",
566
+ "xyzservices": "2025.1.0",
567
+ "h5netcdf": "1.5.0",
568
+ "in-toto-attestation": "0.9.3",
569
+ "rfc3161-client": "0.1.2",
570
+ "pyasn1_modules": "0.4.1",
571
+ "widgetsnbextension": "3.6.10",
572
+ "cuda-python": "12.6.0",
573
+ "sphinxcontrib-htmlhelp": "2.1.0",
574
+ "annotated-types": "0.7.0",
575
+ "cloudpathlib": "0.20.0",
576
+ "jellyfish": "1.1.0",
577
+ "gdown": "5.2.0",
578
+ "msgpack": "1.1.0",
579
+ "pooch": "1.8.2",
580
+ "sigstore": "3.6.1",
581
+ "beautifulsoup4": "4.13.3",
582
+ "cufflinks": "0.17.3",
583
+ "h2": "4.2.0",
584
+ "bqplot": "0.12.44",
585
+ "numba": "0.61.0",
586
+ "shellingham": "1.5.4",
587
+ "Bottleneck": "1.4.2",
588
+ "mdurl": "0.1.2",
589
+ "soxr": "0.5.0.post1",
590
+ "confection": "0.1.5",
591
+ "securesystemslib": "1.2.0",
592
+ "ale-py": "0.10.1",
593
+ "google-cloud-language": "2.16.0",
594
+ "osqp": "0.6.7.post3",
595
+ "gspread": "6.1.4",
596
+ "treescope": "0.1.8",
597
+ "frozendict": "2.4.6",
598
+ "nbclassic": "1.2.0",
599
+ "tqdm": "4.67.1",
600
+ "websockets": "14.2",
601
+ "pydot": "3.0.4",
602
+ "array_record": "0.6.0",
603
+ "PyOpenGL": "3.1.9",
604
+ "google-generativeai": "0.8.4",
605
+ "httpimport": "1.4.0",
606
+ "colorcet": "3.1.0",
607
+ "tf_keras": "2.18.0",
608
+ "scikit-image": "0.25.1",
609
+ "vega-datasets": "0.9.0",
610
+ "PyYAML": "6.0.2",
611
+ "proglog": "0.1.10",
612
+ "yfinance": "0.2.52",
613
+ "ipykernel": "5.5.6",
614
+ "tzlocal": "5.2",
615
+ "filelock": "3.17.0",
616
+ "Farama-Notifications": "0.0.4",
617
+ "spacy": "3.7.5",
618
+ "Flask": "3.1.0",
619
+ "nbformat": "5.10.4",
620
+ "google-cloud-bigtable": "2.28.1",
621
+ "fastdownload": "0.0.7",
622
+ "oauth2client": "4.1.3",
623
+ "libcudf-cu12": "24.12.0",
624
+ "tables": "3.10.2",
625
+ "google-resumable-media": "2.7.2",
626
+ "jax-cuda12-pjrt": "0.4.33",
627
+ "xarray-einstats": "0.8.0",
628
+ "sqlparse": "0.5.3",
629
+ "ratelim": "0.1.6",
630
+ "sphinxcontrib-jsmath": "1.0.1",
631
+ "ipython-genutils": "0.2.0",
632
+ "seaborn": "0.13.2",
633
+ "pandocfilters": "1.5.1",
634
+ "python-apt": "2.4.0+ubuntu4",
635
+ "requirements-parser": "0.9.0",
636
+ "pip": "24.1.2",
637
+ "setuptools": "75.1.0",
638
+ "types-setuptools": "75.8.0.20250210",
639
+ "dbus-python": "1.2.18",
640
+ "SecretStorage": "3.3.1",
641
+ "importlib-metadata": "4.6.4",
642
+ "keyring": "23.5.0",
643
+ "launchpadlib": "1.10.16",
644
+ "jeepney": "0.7.1",
645
+ "PyGObject": "3.42.1",
646
+ "lazr.uri": "1.0.6",
647
+ "lazr.restfulclient": "0.14.4",
648
+ "wadllib": "1.3.6",
649
+ "backports.tarfile": "1.2.0",
650
+ "jaraco.collections": "5.1.0",
651
+ "autocommand": "2.2.2",
652
+ "tomli": "2.0.1",
653
+ "jaraco.functools": "4.0.1",
654
+ "jaraco.context": "5.3.0",
655
+ "jaraco.text": "3.12.1"
656
+ }
evaluation/lm_harness_20250217_104435/platform_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": {
3
+ "mmlu_college_computer_science": {
4
+ "alias": "college_computer_science",
5
+ "acc,none": 0.36,
6
+ "acc_stderr,none": 0.04824181513244218
7
+ }
8
+ },
9
+ "duration_sec": 22.821773290634155,
10
+ "start_time": "20250217_104435"
11
+ }
evaluation/lm_harness_20250217_104435/platform_task_config.json ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "group_subtasks": {
3
+ "mmlu_college_computer_science": []
4
+ },
5
+ "configs": {
6
+ "mmlu_college_computer_science": {
7
+ "task": "mmlu_college_computer_science",
8
+ "task_alias": "college_computer_science",
9
+ "tag": "mmlu_stem_tasks",
10
+ "dataset_path": "hails/mmlu_no_train",
11
+ "dataset_name": "college_computer_science",
12
+ "dataset_kwargs": {
13
+ "trust_remote_code": true
14
+ },
15
+ "test_split": "test",
16
+ "fewshot_split": "dev",
17
+ "doc_to_text": "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:",
18
+ "doc_to_target": "answer",
19
+ "doc_to_choice": [
20
+ "A",
21
+ "B",
22
+ "C",
23
+ "D"
24
+ ],
25
+ "description": "The following are multiple choice questions (with answers) about college computer science.\n\n",
26
+ "target_delimiter": " ",
27
+ "fewshot_delimiter": "\n\n",
28
+ "fewshot_config": {
29
+ "sampler": "first_n"
30
+ },
31
+ "num_fewshot": 0,
32
+ "metric_list": [
33
+ {
34
+ "metric": "acc",
35
+ "aggregation": "mean",
36
+ "higher_is_better": true
37
+ }
38
+ ],
39
+ "output_type": "multiple_choice",
40
+ "repeats": 1,
41
+ "should_decontaminate": false,
42
+ "metadata": {
43
+ "version": 1.0
44
+ }
45
+ }
46
+ },
47
+ "versions": {
48
+ "mmlu_college_computer_science": 1.0
49
+ },
50
+ "n-shot": {
51
+ "mmlu_college_computer_science": 0
52
+ },
53
+ "higher_is_better": {
54
+ "mmlu_college_computer_science": {
55
+ "acc": true
56
+ }
57
+ },
58
+ "n-samples": {
59
+ "mmlu_college_computer_science": {
60
+ "original": 100,
61
+ "effective": 100
62
+ }
63
+ },
64
+ "config": {
65
+ "model": "hf",
66
+ "model_args": {
67
+ "pretrained": "HuggingFaceTB/SmolLM2-1.7B-Instruct",
68
+ "trust_remote_code": false,
69
+ "parallelize": false,
70
+ "dtype": "torch.bfloat16",
71
+ "device_map": "auto",
72
+ "peft": "finetuning_tutorial/output"
73
+ },
74
+ "model_num_parameters": 1729464320,
75
+ "model_dtype": "torch.bfloat16",
76
+ "model_revision": "main",
77
+ "model_sha": "4db41b7c29f5e1f883ab90b5129c21519ccb526e",
78
+ "peft_sha": "",
79
+ "batch_size": "auto",
80
+ "batch_sizes": [
81
+ 64
82
+ ],
83
+ "device": "cuda:0",
84
+ "use_cache": null,
85
+ "limit": null,
86
+ "bootstrap_iters": 100000,
87
+ "gen_kwargs": null,
88
+ "random_seed": 0,
89
+ "numpy_seed": 1234,
90
+ "torch_seed": 1234,
91
+ "fewshot_seed": 1234
92
+ },
93
+ "git_hash": null,
94
+ "date": 1739789075.4318917,
95
+ "pretty_env_info": "PyTorch version: 2.4.0+cu121\nIs debug build: False\nCUDA used to build PyTorch: 12.1\nROCM used to build PyTorch: N/A\n\nOS: Ubuntu 22.04.4 LTS (x86_64)\nGCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0\nClang version: 14.0.0-1ubuntu1.1\nCMake version: version 3.31.4\nLibc version: glibc-2.35\n\nPython version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0] (64-bit runtime)\nPython platform: Linux-6.1.85+-x86_64-with-glibc2.35\nIs CUDA available: True\nCUDA runtime version: 12.5.82\nCUDA_MODULE_LOADING set to: LAZY\nGPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB\nNvidia driver version: 550.54.15\ncuDNN version: Probably one of the following:\n/usr/lib/x86_64-linux-gnu/libcudnn.so.9.2.1\n/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.2.1\n/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.2.1\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.2.1\n/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.2.1\n/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.2.1\n/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.2.1\n/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.2.1\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nArchitecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nAddress sizes: 46 bits physical, 48 bits virtual\nByte Order: Little Endian\nCPU(s): 12\nOn-line CPU(s) list: 0-11\nVendor ID: GenuineIntel\nModel name: Intel(R) Xeon(R) CPU @ 2.20GHz\nCPU family: 6\nModel: 85\nThread(s) per core: 2\nCore(s) per socket: 6\nSocket(s): 1\nStepping: 7\nBogoMIPS: 4400.29\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities\nHypervisor vendor: KVM\nVirtualization type: full\nL1d cache: 192 KiB (6 instances)\nL1i cache: 192 KiB (6 instances)\nL2 cache: 6 MiB (6 instances)\nL3 cache: 38.5 MiB (1 instance)\nNUMA node(s): 1\nNUMA node0 CPU(s): 0-11\nVulnerability Gather data sampling: Not affected\nVulnerability Itlb multihit: Not affected\nVulnerability L1tf: Not affected\nVulnerability Mds: Not affected\nVulnerability Meltdown: Not affected\nVulnerability Mmio stale data: Vulnerable\nVulnerability Reg file data sampling: Not affected\nVulnerability Retbleed: Vulnerable\nVulnerability Spec rstack overflow: Not affected\nVulnerability Spec store bypass: Vulnerable\nVulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers\nVulnerability Spectre v2: Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Vulnerable; BHI: Vulnerable (Syscall hardening enabled)\nVulnerability Srbds: Not affected\nVulnerability Tsx async abort: Vulnerable\n\nVersions of relevant libraries:\n[pip3] numpy==1.26.4\n[pip3] optree==0.14.0\n[pip3] torch==2.4.0\n[pip3] torchaudio==2.5.1+cu124\n[pip3] torchdata==0.9.0\n[pip3] torchsummary==1.5.1\n[pip3] torchvision==0.19.0\n[pip3] triton==3.0.0\n[conda] Could not collect",
96
+ "transformers_version": "4.45.2",
97
+ "upper_git_hash": null,
98
+ "tokenizer_pad_token": [
99
+ "<|im_end|>",
100
+ "2"
101
+ ],
102
+ "tokenizer_eos_token": [
103
+ "<|im_end|>",
104
+ "2"
105
+ ],
106
+ "tokenizer_bos_token": [
107
+ "<|im_start|>",
108
+ "1"
109
+ ],
110
+ "eot_token_id": 2,
111
+ "max_length": 8192
112
+ }
evaluation/lm_harness_20250217_104435/task_params.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "evaluation_platform": "lm_harness",
3
+ "task_name": "mmlu_college_computer_science",
4
+ "num_samples": null,
5
+ "eval_kwargs": {},
6
+ "num_fewshot": null
7
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 2,
6
+ "transformers_version": "4.42.3"
7
+ }
instructions_function_calling.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Quick start
2
+ Instructions for funtion calling:
3
+
4
+ ```python
5
+ import json
6
+ import re
7
+ from typing import Optional
8
+
9
+ from jinja2 import Template
10
+ import torch
11
+ from transformers import AutoModelForCausalLM, AutoTokenizer
12
+ from transformers.utils import get_json_schema
13
+
14
+
15
+ system_prompt = Template("""You are an expert in composing functions. You are given a question and a set of possible functions.
16
+ Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
17
+ If none of the functions can be used, point it out and refuse to answer.
18
+ If the given question lacks the parameters required by the function, also point it out.
19
+
20
+ You have access to the following tools:
21
+ <tools>{{ tools }}</tools>
22
+
23
+ The output MUST strictly adhere to the following format, and NO other text MUST be included.
24
+ The example format is as follows. Please make sure the parameter type is correct. If no function call is needed, please make the tool calls an empty list '[]'.
25
+ <tool_call>[
26
+ {"name": "func_name1", "arguments": {"argument1": "value1", "argument2": "value2"}},
27
+ ... (more tool calls as required)
28
+ ]</tool_call>""")
29
+
30
+
31
+ def prepare_messages(
32
+ query: str,
33
+ tools: Optional[dict[str, any]] = None,
34
+ history: Optional[list[dict[str, str]]] = None
35
+ ) -> list[dict[str, str]]:
36
+ """Prepare the system and user messages for the given query and tools.
37
+
38
+ Args:
39
+ query: The query to be answered.
40
+ tools: The tools available to the user. Defaults to None, in which case if a
41
+ list without content will be passed to the model.
42
+ history: Exchange of messages, including the system_prompt from
43
+ the first query. Defaults to None, the first message in a conversation.
44
+ """
45
+ if tools is None:
46
+ tools = []
47
+ if history:
48
+ messages = history.copy()
49
+ messages.append({"role": "user", "content": query})
50
+ else:
51
+ messages = [
52
+ {"role": "system", "content": system_prompt.render(tools=json.dumps(tools))},
53
+ {"role": "user", "content": query}
54
+ ]
55
+ return messages
56
+
57
+
58
+ def parse_response(text: str) -> str | dict[str, any]:
59
+ """Parses a response from the model, returning either the
60
+ parsed list with the tool calls parsed, or the
61
+ model thought or response if couldn't generate one.
62
+
63
+ Args:
64
+ text: Response from the model.
65
+ """
66
+ pattern = r"<tool_call>(.*?)</tool_call>"
67
+ matches = re.findall(pattern, text, re.DOTALL)
68
+ if matches:
69
+ return json.loads(matches[0])
70
+ return text
71
+
72
+ model_name_smollm = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
73
+ model = AutoModelForCausalLM.from_pretrained(model_name_smollm, device_map="auto", torch_dtype="auto", trust_remote_code=True)
74
+ tokenizer = AutoTokenizer.from_pretrained(model_name_smollm)
75
+
76
+ from datetime import datetime
77
+ import random
78
+
79
+ def get_current_time() -> str:
80
+ """Returns the current time in 24-hour format.
81
+
82
+ Returns:
83
+ str: Current time in HH:MM:SS format.
84
+ """
85
+ return datetime.now().strftime("%H:%M:%S")
86
+
87
+
88
+ def get_random_number_between(min: int, max: int) -> int:
89
+ """
90
+ Gets a random number between min and max.
91
+
92
+ Args:
93
+ min: The minimum number.
94
+ max: The maximum number.
95
+
96
+ Returns:
97
+ A random number between min and max.
98
+ """
99
+ return random.randint(min, max)
100
+
101
+
102
+ tools = [get_json_schema(get_random_number_between), get_json_schema(get_current_time)]
103
+
104
+ toolbox = {"get_random_number_between": get_random_number_between, "get_current_time": get_current_time}
105
+
106
+ query = "Give me a number between 1 and 300"
107
+
108
+ messages = prepare_messages(query, tools=tools)
109
+
110
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
111
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
112
+ result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
113
+
114
+ tool_calls = parse_response(result)
115
+ # [{'name': 'get_random_number_between', 'arguments': {'min': 1, 'max': 300}}
116
+
117
+ # Get tool responses
118
+ tool_responses = [toolbox.get(tc["name"])(*tc["arguments"].values()) for tc in tool_calls]
119
+ # [63]
120
+
121
+ # For the second turn, rebuild the history of messages:
122
+ history = messages.copy()
123
+ # Add the "parsed response"
124
+ history.append({"role": "assistant", "content": result})
125
+ query = "Can you give me the hour?"
126
+ history.append({"role": "user", "content": query})
127
+
128
+ inputs = tokenizer.apply_chat_template(history, add_generation_prompt=True, return_tensors="pt").to(model.device)
129
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
130
+ result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
131
+
132
+ tool_calls = parse_response(result)
133
+ tool_responses = [toolbox.get(tc["name"])(*tc["arguments"].values()) for tc in tool_calls]
134
+ # ['07:57:25']
135
+ ```
136
+
137
+ #### Parallel function calls
138
+
139
+ Multiple calls required by the same query.
140
+
141
+ ```python
142
+ query = "Can you give me the hour and a random number between 1 and 50?"
143
+
144
+ messages = prepare_messages(query, tools=tools)
145
+
146
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
147
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
148
+ result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
149
+
150
+ tool_calls = parse_response(result)
151
+ tool_responses = [toolbox.get(tc["name"])(*tc["arguments"].values()) for tc in tool_calls]
152
+ # ['09:24:52', 50]
153
+
154
+ query = "Can you give me a random number between 1 and 10, other between 200 and 210 and another one between 55 and 60?"
155
+
156
+ messages = prepare_messages(query, tools=tools)
157
+
158
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
159
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
160
+ result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
161
+
162
+ tool_calls = parse_response(result)
163
+ tool_responses = [toolbox.get(tc["name"])(*tc["arguments"].values()) for tc in tool_calls]
164
+ # [7, 202, 60]
165
+ ```
166
+
167
+ #### Tools not available
168
+
169
+ ```python
170
+ query = "Can you open a new page with youtube?"
171
+
172
+ messages = prepare_messages(query, tools=tools)
173
+
174
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
175
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
176
+ result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
177
+
178
+ tool_calls = parse_response(result)
179
+ # []
180
+
181
+ # The response will be something similar to the following:
182
+ # "The query cannot be answered with the provided tools. Please make sure the tools are correctly installed and imported. If the tools are not installed, install them using pip: 'pip install -r tools.txt'. If the tools are already installed, ensure they are correctly configured. If the tools are not correctly configured, please contact the support team. The output MUST strictly adhere to the following format, and NO other text MUST be included.\n\n<tool_call>[]</tool_call>"
183
+ ```
logs/rank_0000.log ADDED
@@ -0,0 +1,349 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-02-17 09:38:37,891][oumi][rank0][pid:3150][MainThread][INFO]][train.py:144] Resolved 'training.dataloader_num_workers=auto' to 'training.dataloader_num_workers=2'
2
+ [2025-02-17 09:38:37,892][oumi][rank0][pid:3150][MainThread][INFO]][train.py:174] TrainingConfig:
3
+ TrainingConfig(data=DataParams(train=DatasetSplitParams(datasets=[DatasetParams(dataset_name='PromptResponseDataset',
4
+ dataset_path=None,
5
+ subset=None,
6
+ split='train',
7
+ dataset_kwargs={'assistant_only': True,
8
+ 'hf_dataset_path': 'Unseen1980/fiori-tools-support-ga',
9
+ 'instruction_template': '<|im_start|>user\n',
10
+ 'prompt_column': 'question',
11
+ 'response_column': 'answer',
12
+ 'response_template': '<|im_start|>assistant\n'},
13
+ sample_count=8000,
14
+ mixture_proportion=None,
15
+ shuffle=True,
16
+ seed=42,
17
+ shuffle_buffer_size=1000,
18
+ trust_remote_code=False,
19
+ transform_num_workers=None)],
20
+ collator_name='text_with_padding',
21
+ pack=False,
22
+ stream=False,
23
+ target_col=None,
24
+ mixture_strategy='first_exhausted',
25
+ seed=42,
26
+ use_async_dataset=False,
27
+ use_torchdata=None),
28
+ test=DatasetSplitParams(datasets=[],
29
+ collator_name=None,
30
+ pack=False,
31
+ stream=False,
32
+ target_col=None,
33
+ mixture_strategy='first_exhausted',
34
+ seed=None,
35
+ use_async_dataset=False,
36
+ use_torchdata=None),
37
+ validation=DatasetSplitParams(datasets=[],
38
+ collator_name=None,
39
+ pack=False,
40
+ stream=False,
41
+ target_col=None,
42
+ mixture_strategy='first_exhausted',
43
+ seed=None,
44
+ use_async_dataset=False,
45
+ use_torchdata=None)),
46
+ model=ModelParams(model_name='HuggingFaceTB/SmolLM2-1.7B-Instruct',
47
+ adapter_model=None,
48
+ tokenizer_name=None,
49
+ tokenizer_pad_token='<|endoftext|>',
50
+ tokenizer_kwargs={},
51
+ model_max_length=None,
52
+ load_pretrained_weights=True,
53
+ trust_remote_code=True,
54
+ torch_dtype_str='bfloat16',
55
+ compile=False,
56
+ chat_template=None,
57
+ attn_implementation=None,
58
+ device_map='auto',
59
+ model_kwargs={},
60
+ enable_liger_kernel=False,
61
+ shard_for_eval=False,
62
+ freeze_layers=[]),
63
+ training=TrainingParams(use_peft=True,
64
+ trainer_type=<TrainerType.TRL_SFT: 'trl_sft'>,
65
+ enable_gradient_checkpointing=True,
66
+ gradient_checkpointing_kwargs={'use_reentrant': False},
67
+ output_dir='finetuning_tutorial/output',
68
+ per_device_train_batch_size=2,
69
+ per_device_eval_batch_size=8,
70
+ gradient_accumulation_steps=8,
71
+ max_steps=1500,
72
+ num_train_epochs=3,
73
+ save_epoch=False,
74
+ save_steps=0,
75
+ save_final_model=True,
76
+ seed=42,
77
+ run_name=None,
78
+ metrics_function=None,
79
+ log_level='info',
80
+ dep_log_level='warning',
81
+ enable_wandb=False,
82
+ enable_tensorboard=True,
83
+ logging_strategy='steps',
84
+ logging_dir=None,
85
+ logging_steps=10,
86
+ logging_first_step=False,
87
+ eval_strategy='no',
88
+ eval_steps=500,
89
+ learning_rate=0.001,
90
+ lr_scheduler_type='linear',
91
+ lr_scheduler_kwargs={},
92
+ warmup_ratio=0.1,
93
+ warmup_steps=None,
94
+ optimizer='adamw_torch_fused',
95
+ weight_decay=0.01,
96
+ adam_beta1=0.9,
97
+ adam_beta2=0.999,
98
+ adam_epsilon=1e-08,
99
+ sgd_momentum=0.0,
100
+ mixed_precision_dtype=<MixedPrecisionDtype.NONE: 'none'>,
101
+ compile=False,
102
+ include_performance_metrics=False,
103
+ include_alternative_mfu_metrics=False,
104
+ log_model_summary=False,
105
+ resume_from_checkpoint=None,
106
+ try_resume_from_last_checkpoint=False,
107
+ dataloader_num_workers=2,
108
+ dataloader_prefetch_factor=32,
109
+ dataloader_main_process_only=None,
110
+ ddp_find_unused_parameters=False,
111
+ max_grad_norm=1.0,
112
+ trainer_kwargs={},
113
+ profiler=ProfilerParams(save_dir=None,
114
+ enable_cpu_profiling=False,
115
+ enable_cuda_profiling=False,
116
+ record_shapes=False,
117
+ profile_memory=False,
118
+ with_stack=False,
119
+ with_flops=False,
120
+ with_modules=False,
121
+ row_limit=50,
122
+ schedule=ProfilerScheduleParams(enable_schedule=False,
123
+ wait=0,
124
+ warmup=1,
125
+ active=3,
126
+ repeat=1,
127
+ skip_first=1)),
128
+ telemetry=TelemetryParams(telemetry_dir='telemetry',
129
+ collect_telemetry_for_all_ranks=False,
130
+ track_gpu_temperature=False),
131
+ empty_device_cache_steps=1,
132
+ nccl_default_timeout_minutes=None),
133
+ peft=PeftParams(lora_r=16,
134
+ lora_alpha=32,
135
+ lora_dropout=0.0,
136
+ lora_target_modules=['q_proj',
137
+ 'k_proj',
138
+ 'v_proj',
139
+ 'o_proj',
140
+ 'gate_proj',
141
+ 'up_proj',
142
+ 'down_proj'],
143
+ lora_modules_to_save=None,
144
+ lora_bias='none',
145
+ lora_init_weights=<LoraWeightInitialization.DEFAULT: 'default'>,
146
+ lora_task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>,
147
+ q_lora=False,
148
+ q_lora_bits=4,
149
+ bnb_4bit_quant_type='fp4',
150
+ use_bnb_nested_quant=False,
151
+ bnb_4bit_quant_storage='uint8',
152
+ bnb_4bit_compute_dtype='float32',
153
+ peft_save_mode=<PeftSaveMode.ADAPTER_ONLY: 'adapter_only'>),
154
+ fsdp=FSDPParams(enable_fsdp=False,
155
+ sharding_strategy=<ShardingStrategy.FULL_SHARD: 'FULL_SHARD'>,
156
+ cpu_offload=False,
157
+ mixed_precision=None,
158
+ backward_prefetch=<BackwardPrefetch.BACKWARD_PRE: 'BACKWARD_PRE'>,
159
+ forward_prefetch=False,
160
+ use_orig_params=None,
161
+ state_dict_type=<StateDictType.FULL_STATE_DICT: 'FULL_STATE_DICT'>,
162
+ auto_wrap_policy=<AutoWrapPolicy.NO_WRAP: 'NO_WRAP'>,
163
+ min_num_params=100000,
164
+ transformer_layer_cls=None,
165
+ sync_module_states=True))
166
+ [2025-02-17 09:38:38,506][oumi][rank0][pid:3150][MainThread][INFO]][models.py:185] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
167
+ [2025-02-17 09:38:38,507][oumi][rank0][pid:3150][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
168
+ [2025-02-17 09:38:40,396][oumi][rank0][pid:3150][MainThread][INFO]][train.py:226] Building PEFT model...
169
+ [2025-02-17 09:38:40,880][oumi][rank0][pid:3150][MainThread][INFO]][base_map_dataset.py:68] Creating map dataset (type: PromptResponseDataset) dataset_name: 'Unseen1980/fiori-tools-support-ga', dataset_path: 'None'...
170
+ [2025-02-17 09:38:40,881][oumi][rank0][pid:3150][MainThread][WARNING]][base_sft_dataset.py:251] Response template '<|im_start|>assistant
171
+ ' contains leading or trailing whitespaces. These will be ignored.
172
+ [2025-02-17 09:38:40,881][oumi][rank0][pid:3150][MainThread][WARNING]][base_sft_dataset.py:267] Instruction template '<|im_start|>user
173
+ ' contains leading or trailing whitespaces. These will be ignored.
174
+ [2025-02-17 09:38:44,489][oumi][rank0][pid:3150][MainThread][INFO]][base_map_dataset.py:472] Dataset Info:
175
+ Split: train
176
+ Version: 0.0.0
177
+ Dataset size: 297339
178
+ Download size: 296628
179
+ Size: 593967 bytes
180
+ Rows: 417
181
+ Columns: ['question', 'answer']
182
+ [2025-02-17 09:38:44,782][oumi][rank0][pid:3150][MainThread][INFO]][base_map_dataset.py:411] Loaded DataFrame with shape: (417, 2). Columns:
183
+ question object
184
+ answer object
185
+ dtype: object
186
+ [2025-02-17 09:38:44,865][oumi][rank0][pid:3150][MainThread][INFO]][base_map_dataset.py:297] PromptResponseDataset: features=dict_keys(['input_ids', 'attention_mask', 'labels'])
187
+ [2025-02-17 09:38:45,387][oumi][rank0][pid:3150][MainThread][INFO]][base_map_dataset.py:361] Finished transforming dataset (PromptResponseDataset)! Speed: 798.37 examples/sec. Examples: 417. Duration: 0.5 sec. Transform workers: 1.
188
+ [2025-02-17 09:38:46,070][oumi][rank0][pid:3150][MainThread][INFO]][torch_profiler_utils.py:150] PROF: Torch Profiler disabled!
189
+ [2025-02-17 09:38:46,091][oumi][rank0][pid:3150][MainThread][INFO]][training.py:49] SFTConfig(output_dir='finetuning_tutorial/output',
190
+ overwrite_output_dir=False,
191
+ do_train=False,
192
+ do_eval=False,
193
+ do_predict=False,
194
+ eval_strategy=<IntervalStrategy.NO: 'no'>,
195
+ prediction_loss_only=False,
196
+ per_device_train_batch_size=2,
197
+ per_device_eval_batch_size=8,
198
+ per_gpu_train_batch_size=None,
199
+ per_gpu_eval_batch_size=None,
200
+ gradient_accumulation_steps=8,
201
+ eval_accumulation_steps=None,
202
+ eval_delay=0,
203
+ torch_empty_cache_steps=1,
204
+ learning_rate=0.001,
205
+ weight_decay=0.01,
206
+ adam_beta1=0.9,
207
+ adam_beta2=0.999,
208
+ adam_epsilon=1e-08,
209
+ max_grad_norm=1.0,
210
+ num_train_epochs=3,
211
+ max_steps=1500,
212
+ lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>,
213
+ lr_scheduler_kwargs={},
214
+ warmup_ratio=0.1,
215
+ warmup_steps=0,
216
+ log_level='warning',
217
+ log_level_replica='warning',
218
+ log_on_each_node=True,
219
+ logging_dir='finetuning_tutorial/output/runs/Feb17_09-38-46_7975b2aca0fe',
220
+ logging_strategy=<IntervalStrategy.STEPS: 'steps'>,
221
+ logging_first_step=False,
222
+ logging_steps=10,
223
+ logging_nan_inf_filter=True,
224
+ save_strategy=<IntervalStrategy.NO: 'no'>,
225
+ save_steps=0,
226
+ save_total_limit=None,
227
+ save_safetensors=True,
228
+ save_on_each_node=False,
229
+ save_only_model=False,
230
+ restore_callback_states_from_checkpoint=False,
231
+ no_cuda=False,
232
+ use_cpu=False,
233
+ use_mps_device=False,
234
+ seed=42,
235
+ data_seed=None,
236
+ jit_mode_eval=False,
237
+ use_ipex=False,
238
+ bf16=False,
239
+ fp16=False,
240
+ fp16_opt_level='O1',
241
+ half_precision_backend='auto',
242
+ bf16_full_eval=False,
243
+ fp16_full_eval=False,
244
+ tf32=None,
245
+ local_rank=0,
246
+ ddp_backend=None,
247
+ tpu_num_cores=None,
248
+ tpu_metrics_debug=False,
249
+ debug=[],
250
+ dataloader_drop_last=False,
251
+ eval_steps=500,
252
+ dataloader_num_workers=2,
253
+ dataloader_prefetch_factor=32,
254
+ past_index=-1,
255
+ run_name='finetuning_tutorial/output',
256
+ disable_tqdm=False,
257
+ remove_unused_columns=True,
258
+ label_names=None,
259
+ load_best_model_at_end=False,
260
+ metric_for_best_model=None,
261
+ greater_is_better=None,
262
+ ignore_data_skip=False,
263
+ fsdp=[],
264
+ fsdp_min_num_params=0,
265
+ fsdp_config={'min_num_params': 0,
266
+ 'xla': False,
267
+ 'xla_fsdp_grad_ckpt': False,
268
+ 'xla_fsdp_v2': False},
269
+ fsdp_transformer_layer_cls_to_wrap=None,
270
+ accelerator_config=AcceleratorConfig(split_batches=False,
271
+ dispatch_batches=None,
272
+ even_batches=True,
273
+ use_seedable_sampler=True,
274
+ non_blocking=False,
275
+ gradient_accumulation_kwargs=None,
276
+ use_configured_state=False),
277
+ deepspeed=None,
278
+ label_smoothing_factor=0.0,
279
+ optim=<OptimizerNames.ADAMW_TORCH_FUSED: 'adamw_torch_fused'>,
280
+ optim_args=None,
281
+ adafactor=False,
282
+ group_by_length=False,
283
+ length_column_name='length',
284
+ report_to=['tensorboard'],
285
+ ddp_find_unused_parameters=False,
286
+ ddp_bucket_cap_mb=None,
287
+ ddp_broadcast_buffers=None,
288
+ dataloader_pin_memory=True,
289
+ dataloader_persistent_workers=False,
290
+ skip_memory_metrics=True,
291
+ use_legacy_prediction_loop=False,
292
+ push_to_hub=False,
293
+ resume_from_checkpoint=None,
294
+ hub_model_id=None,
295
+ hub_strategy=<HubStrategy.EVERY_SAVE: 'every_save'>,
296
+ hub_token=None,
297
+ hub_private_repo=False,
298
+ hub_always_push=False,
299
+ gradient_checkpointing=True,
300
+ gradient_checkpointing_kwargs={'use_reentrant': False},
301
+ include_inputs_for_metrics=False,
302
+ eval_do_concat_batches=True,
303
+ fp16_backend='auto',
304
+ evaluation_strategy=None,
305
+ push_to_hub_model_id=None,
306
+ push_to_hub_organization=None,
307
+ push_to_hub_token=None,
308
+ mp_parameters='',
309
+ auto_find_batch_size=False,
310
+ full_determinism=False,
311
+ torchdynamo=None,
312
+ ray_scope='last',
313
+ ddp_timeout=1800,
314
+ torch_compile=False,
315
+ torch_compile_backend=None,
316
+ torch_compile_mode=None,
317
+ dispatch_batches=None,
318
+ split_batches=None,
319
+ include_tokens_per_second=False,
320
+ include_num_input_tokens_seen=False,
321
+ neftune_noise_alpha=None,
322
+ optim_target_modules=None,
323
+ batch_eval_metrics=False,
324
+ eval_on_start=False,
325
+ use_liger_kernel=False,
326
+ eval_use_gather_object=False,
327
+ dataset_text_field=None,
328
+ packing=False,
329
+ max_seq_length=None,
330
+ dataset_num_proc=None,
331
+ dataset_batch_size=1000,
332
+ model_init_kwargs=None,
333
+ dataset_kwargs=None,
334
+ eval_packing=None,
335
+ num_of_sequences=1024,
336
+ chars_per_token=3.6,
337
+ use_liger=False)
338
+ [2025-02-17 09:38:46,110][oumi][rank0][pid:3150][MainThread][INFO]][device_utils.py:283] GPU Metrics Before Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=1, used_memory_mb=8477.0, temperature=31, fan_speed=None, fan_speeds=None, power_usage_watts=50.01, power_limit_watts=400.0, gpu_utilization=0, memory_utilization=0, performance_state=0, clock_speed_graphics=1095, clock_speed_sm=1095, clock_speed_memory=1215).
339
+ [2025-02-17 09:38:46,110][oumi][rank0][pid:3150][MainThread][INFO]][train.py:312] Training init time: 8.237s
340
+ [2025-02-17 09:38:46,110][oumi][rank0][pid:3150][MainThread][INFO]][train.py:313] Starting training... (TrainerType.TRL_SFT, transformers: 4.45.2)
341
+ [2025-02-17 10:40:30,109][oumi][rank0][pid:3150][MainThread][INFO]][train.py:320] Training is Complete.
342
+ [2025-02-17 10:40:30,115][oumi][rank0][pid:3150][MainThread][INFO]][device_utils.py:283] GPU Metrics After Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=1, used_memory_mb=9009.0, temperature=39, fan_speed=None, fan_speeds=None, power_usage_watts=107.601, power_limit_watts=400.0, gpu_utilization=29, memory_utilization=8, performance_state=0, clock_speed_graphics=1410, clock_speed_sm=1410, clock_speed_memory=1215).
343
+ [2025-02-17 10:40:30,115][oumi][rank0][pid:3150][MainThread][INFO]][torch_utils.py:117] Peak GPU memory usage: 4.78 GB
344
+ [2025-02-17 10:40:30,116][oumi][rank0][pid:3150][MainThread][INFO]][train.py:327] Saving final state...
345
+ [2025-02-17 10:40:30,119][oumi][rank0][pid:3150][MainThread][INFO]][train.py:332] Saving final model...
346
+ [2025-02-17 10:40:30,921][oumi][rank0][pid:3150][MainThread][INFO]][hf_trainer.py:102] Model has been saved at finetuning_tutorial/output
347
+ [2025-02-17 10:40:30,921][oumi][rank0][pid:3150][MainThread][INFO]][train.py:339]
348
+
349
+ » We're always looking for feedback. What's one thing we can improve? https://oumi.ai/feedback
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f55217be716b6a997b97b9d8d7eb6fad02e00858f5010ec24f64603c3a98a0e8
3
+ size 3422777952
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d47cfffc09e0062a9c7c03ed5d768df1bfffc6acf27c55e266eced858da41f8
3
+ size 179093
onnx/model.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:023686a59a534e45af70bc5f99ae70e592481701680591f9844fc140a3db220a
3
+ size 6847602688
onnx/model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:565c43798c19a63941d360ac00830431a1ab79a9893748c8624b52049cabeb5e
3
+ size 1311321168
onnx/model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecfaaeb3e66daa109e6d1ab365ce712dc833322045ca23e264a45acc80ec48e7
3
+ size 1326821411
onnx/model_fp16.onnx_data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f48c05c14ed97738f8dc5854c20c229ddc8661f43fa914085843901a4ba8740
3
+ size 2097152000
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8eeead8e191939562a98af969f9d63dd404dc72a9a65b2c19cabb857de7c8d9
3
+ size 1714133062
onnx/model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24ae37d376f94a0a47ba47b33150da0fe5269017ff71e483fdecbb9ab3230608
3
+ size 1411983120
onnx/model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71b383ff60bbe363fd16e2e842b59e7258f622ffd640c6a18dffd9edec3530e8
3
+ size 1108743793
onnx/model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e038fd6fb27b41fbb62e6a7df9b60b57215db3958d14382221beaab78fbc1d4
3
+ size 1714133130
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e038fd6fb27b41fbb62e6a7df9b60b57215db3958d14382221beaab78fbc1d4
3
+ size 1714133130
runs/Feb17_09-38-46_7975b2aca0fe/events.out.tfevents.1739785126.7975b2aca0fe.3150.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6918c133fa17c1a2eb3c71704bfe2bf9ecd2a59372b74f98d216538adf38f817
3
+ size 37466
runs/Oct31_06-24-59_ip-26-0-174-36/events.out.tfevents.1730356365.ip-26-0-174-36.3169719.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6bfce1916438dd2e6553aa0a62d418087b3ae04f8af75e714ad1f01b7663db6
3
+ size 114828
runs/Oct31_06-24-59_ip-26-0-174-36/events.out.tfevents.1730363825.ip-26-0-174-36.3169719.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3d7723fd0715ce6dcbccf7bb2097f59490b0ac670f798f5378ef5abb7d1301d
3
+ size 828
special_tokens_map.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>"
5
+ ],
6
+ "bos_token": {
7
+ "content": "<|im_start|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "eos_token": {
14
+ "content": "<|im_end|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "pad_token": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ },
27
+ "unk_token": {
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
telemetry/devices_info.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ CPU cores: 12 CUDA devices: 1
2
+ device(0)='NVIDIA A100-SXM4-40GB' Capability: (8, 0) Memory: [Total: 39.56GiB Free: 35.24GiB Allocated: 0.0GiB Cached: 0.0GiB]
telemetry/training_config.yaml ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ data:
2
+ train:
3
+ datasets:
4
+ - dataset_name: PromptResponseDataset
5
+ dataset_path: null
6
+ subset: null
7
+ split: train
8
+ dataset_kwargs:
9
+ hf_dataset_path: Unseen1980/fiori-tools-support-ga
10
+ prompt_column: question
11
+ response_column: answer
12
+ assistant_only: true
13
+ instruction_template: '<|im_start|>user
14
+
15
+ '
16
+ response_template: '<|im_start|>assistant
17
+
18
+ '
19
+ sample_count: 8000
20
+ mixture_proportion: null
21
+ shuffle: true
22
+ seed: 42
23
+ shuffle_buffer_size: 1000
24
+ trust_remote_code: false
25
+ transform_num_workers: null
26
+ collator_name: text_with_padding
27
+ pack: false
28
+ stream: false
29
+ target_col: null
30
+ mixture_strategy: first_exhausted
31
+ seed: 42
32
+ use_async_dataset: false
33
+ use_torchdata: null
34
+ test:
35
+ datasets: []
36
+ collator_name: null
37
+ pack: false
38
+ stream: false
39
+ target_col: null
40
+ mixture_strategy: first_exhausted
41
+ seed: null
42
+ use_async_dataset: false
43
+ use_torchdata: null
44
+ validation:
45
+ datasets: []
46
+ collator_name: null
47
+ pack: false
48
+ stream: false
49
+ target_col: null
50
+ mixture_strategy: first_exhausted
51
+ seed: null
52
+ use_async_dataset: false
53
+ use_torchdata: null
54
+ model:
55
+ model_name: HuggingFaceTB/SmolLM2-1.7B-Instruct
56
+ adapter_model: null
57
+ tokenizer_name: null
58
+ tokenizer_pad_token: <|endoftext|>
59
+ tokenizer_kwargs: {}
60
+ model_max_length: null
61
+ load_pretrained_weights: true
62
+ trust_remote_code: true
63
+ torch_dtype_str: bfloat16
64
+ compile: false
65
+ chat_template: null
66
+ attn_implementation: null
67
+ device_map: auto
68
+ model_kwargs: {}
69
+ enable_liger_kernel: false
70
+ shard_for_eval: false
71
+ freeze_layers: []
72
+ training:
73
+ use_peft: true
74
+ trainer_type: TRL_SFT
75
+ enable_gradient_checkpointing: true
76
+ gradient_checkpointing_kwargs:
77
+ use_reentrant: false
78
+ output_dir: finetuning_tutorial/output
79
+ per_device_train_batch_size: 2
80
+ per_device_eval_batch_size: 8
81
+ gradient_accumulation_steps: 8
82
+ max_steps: 1500
83
+ num_train_epochs: 3
84
+ save_epoch: false
85
+ save_steps: 0
86
+ save_final_model: true
87
+ seed: 42
88
+ run_name: null
89
+ metrics_function: null
90
+ log_level: info
91
+ dep_log_level: warning
92
+ enable_wandb: false
93
+ enable_tensorboard: true
94
+ logging_strategy: steps
95
+ logging_dir: null
96
+ logging_steps: 10
97
+ logging_first_step: false
98
+ eval_strategy: 'no'
99
+ eval_steps: 500
100
+ learning_rate: 0.001
101
+ lr_scheduler_type: linear
102
+ lr_scheduler_kwargs: {}
103
+ warmup_ratio: 0.1
104
+ warmup_steps: null
105
+ optimizer: adamw_torch_fused
106
+ weight_decay: 0.01
107
+ adam_beta1: 0.9
108
+ adam_beta2: 0.999
109
+ adam_epsilon: 1.0e-08
110
+ sgd_momentum: 0.0
111
+ mixed_precision_dtype: NONE
112
+ compile: false
113
+ include_performance_metrics: false
114
+ include_alternative_mfu_metrics: false
115
+ log_model_summary: false
116
+ resume_from_checkpoint: null
117
+ try_resume_from_last_checkpoint: false
118
+ dataloader_num_workers: 2
119
+ dataloader_prefetch_factor: 32
120
+ dataloader_main_process_only: null
121
+ ddp_find_unused_parameters: false
122
+ max_grad_norm: 1.0
123
+ trainer_kwargs: {}
124
+ profiler:
125
+ save_dir: null
126
+ enable_cpu_profiling: false
127
+ enable_cuda_profiling: false
128
+ record_shapes: false
129
+ profile_memory: false
130
+ with_stack: false
131
+ with_flops: false
132
+ with_modules: false
133
+ row_limit: 50
134
+ schedule:
135
+ enable_schedule: false
136
+ wait: 0
137
+ warmup: 1
138
+ active: 3
139
+ repeat: 1
140
+ skip_first: 1
141
+ telemetry:
142
+ telemetry_dir: telemetry
143
+ collect_telemetry_for_all_ranks: false
144
+ track_gpu_temperature: false
145
+ empty_device_cache_steps: 1
146
+ nccl_default_timeout_minutes: null
147
+ peft:
148
+ lora_r: 16
149
+ lora_alpha: 32
150
+ lora_dropout: 0.0
151
+ lora_target_modules:
152
+ - q_proj
153
+ - k_proj
154
+ - v_proj
155
+ - o_proj
156
+ - gate_proj
157
+ - up_proj
158
+ - down_proj
159
+ lora_modules_to_save: null
160
+ lora_bias: none
161
+ lora_init_weights: DEFAULT
162
+ lora_task_type: CAUSAL_LM
163
+ q_lora: false
164
+ q_lora_bits: 4
165
+ bnb_4bit_quant_type: fp4
166
+ use_bnb_nested_quant: false
167
+ bnb_4bit_quant_storage: uint8
168
+ bnb_4bit_compute_dtype: float32
169
+ peft_save_mode: ADAPTER_ONLY
170
+ fsdp:
171
+ enable_fsdp: false
172
+ sharding_strategy: FULL_SHARD
173
+ cpu_offload: false
174
+ mixed_precision: null
175
+ backward_prefetch: BACKWARD_PRE
176
+ forward_prefetch: false
177
+ use_orig_params: null
178
+ state_dict_type: FULL_STATE_DICT
179
+ auto_wrap_policy: NO_WRAP
180
+ min_num_params: 100000
181
+ transformer_layer_cls: null
182
+ sync_module_states: true
telemetry/world_size.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "LOCAL_WORLD_SIZE": 1,
3
+ "WORLD_SIZE": 1
4
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<repo_name>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<reponame>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<file_sep>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<filename>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<gh_stars>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_start>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_comment>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<issue_closed>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_start>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_text>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_code>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<jupyter_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<jupyter_script>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<empty_output>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ }
140
+ },
141
+ "additional_special_tokens": [
142
+ "<|im_start|>",
143
+ "<|im_end|>"
144
+ ],
145
+ "bos_token": "<|im_start|>",
146
+ "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
147
+ "clean_up_tokenization_spaces": false,
148
+ "eos_token": "<|im_end|>",
149
+ "model_max_length": 8192,
150
+ "pad_token": "<|im_end|>",
151
+ "tokenizer_class": "GPT2Tokenizer",
152
+ "unk_token": "<|endoftext|>",
153
+ "vocab_size": 49152
154
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.996074326092646,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5334697115221363,
5
+ "train_runtime": 7355.3343,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 24.935,
8
+ "train_steps_per_second": 0.195
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,2426 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.996074326092646,
5
+ "eval_steps": 100,
6
+ "global_step": 1431,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.002093692750588851,
13
+ "grad_norm": 53.581159797745435,
14
+ "learning_rate": 6.9444444444444435e-09,
15
+ "logits/chosen": -0.48425233364105225,
16
+ "logits/rejected": -0.32109448313713074,
17
+ "logps/chosen": -276.5158996582031,
18
+ "logps/rejected": -302.22406005859375,
19
+ "loss": 0.6931,
20
+ "rewards/accuracies": 0.0,
21
+ "rewards/chosen": 0.0,
22
+ "rewards/margins": 0.0,
23
+ "rewards/rejected": 0.0,
24
+ "step": 1
25
+ },
26
+ {
27
+ "epoch": 0.02093692750588851,
28
+ "grad_norm": 47.92244929429545,
29
+ "learning_rate": 6.944444444444444e-08,
30
+ "logits/chosen": -0.20008452236652374,
31
+ "logits/rejected": -0.1900922954082489,
32
+ "logps/chosen": -337.452392578125,
33
+ "logps/rejected": -293.0785217285156,
34
+ "loss": 0.7231,
35
+ "rewards/accuracies": 0.3541666567325592,
36
+ "rewards/chosen": -0.022724969312548637,
37
+ "rewards/margins": -0.036751341074705124,
38
+ "rewards/rejected": 0.014026367105543613,
39
+ "step": 10
40
+ },
41
+ {
42
+ "epoch": 0.04187385501177702,
43
+ "grad_norm": 41.37562973132045,
44
+ "learning_rate": 1.3888888888888888e-07,
45
+ "logits/chosen": -0.042457859963178635,
46
+ "logits/rejected": -0.12332990020513535,
47
+ "logps/chosen": -298.8910217285156,
48
+ "logps/rejected": -271.74114990234375,
49
+ "loss": 0.7114,
50
+ "rewards/accuracies": 0.4375,
51
+ "rewards/chosen": 0.016111990436911583,
52
+ "rewards/margins": -0.0034494102001190186,
53
+ "rewards/rejected": 0.0195614043623209,
54
+ "step": 20
55
+ },
56
+ {
57
+ "epoch": 0.06281078251766553,
58
+ "grad_norm": 50.166233912169,
59
+ "learning_rate": 2.0833333333333333e-07,
60
+ "logits/chosen": -0.0645759329199791,
61
+ "logits/rejected": -0.09565907716751099,
62
+ "logps/chosen": -323.93743896484375,
63
+ "logps/rejected": -261.7337341308594,
64
+ "loss": 0.7165,
65
+ "rewards/accuracies": 0.4625000059604645,
66
+ "rewards/chosen": -0.056218355894088745,
67
+ "rewards/margins": 0.005095779895782471,
68
+ "rewards/rejected": -0.061314135789871216,
69
+ "step": 30
70
+ },
71
+ {
72
+ "epoch": 0.08374771002355404,
73
+ "grad_norm": 45.45611748658909,
74
+ "learning_rate": 2.7777777777777776e-07,
75
+ "logits/chosen": -0.13370418548583984,
76
+ "logits/rejected": -0.16684015095233917,
77
+ "logps/chosen": -272.07928466796875,
78
+ "logps/rejected": -251.01589965820312,
79
+ "loss": 0.7166,
80
+ "rewards/accuracies": 0.6000000238418579,
81
+ "rewards/chosen": 0.006498994771391153,
82
+ "rewards/margins": 0.06555742770433426,
83
+ "rewards/rejected": -0.05905843526124954,
84
+ "step": 40
85
+ },
86
+ {
87
+ "epoch": 0.10468463752944256,
88
+ "grad_norm": 55.30410487113166,
89
+ "learning_rate": 3.472222222222222e-07,
90
+ "logits/chosen": -0.0792478546500206,
91
+ "logits/rejected": -0.0551227442920208,
92
+ "logps/chosen": -292.2696838378906,
93
+ "logps/rejected": -266.82965087890625,
94
+ "loss": 0.7243,
95
+ "rewards/accuracies": 0.5,
96
+ "rewards/chosen": -0.08804600685834885,
97
+ "rewards/margins": -0.04349173232913017,
98
+ "rewards/rejected": -0.044554274529218674,
99
+ "step": 50
100
+ },
101
+ {
102
+ "epoch": 0.12562156503533106,
103
+ "grad_norm": 56.63523816899612,
104
+ "learning_rate": 4.1666666666666667e-07,
105
+ "logits/chosen": -0.1644289791584015,
106
+ "logits/rejected": -0.19546563923358917,
107
+ "logps/chosen": -269.5990295410156,
108
+ "logps/rejected": -258.08721923828125,
109
+ "loss": 0.7419,
110
+ "rewards/accuracies": 0.518750011920929,
111
+ "rewards/chosen": 0.04003779590129852,
112
+ "rewards/margins": 0.05106702446937561,
113
+ "rewards/rejected": -0.011029230430722237,
114
+ "step": 60
115
+ },
116
+ {
117
+ "epoch": 0.14655849254121958,
118
+ "grad_norm": 50.48428456914726,
119
+ "learning_rate": 4.861111111111111e-07,
120
+ "logits/chosen": -0.1528700441122055,
121
+ "logits/rejected": -0.12956030666828156,
122
+ "logps/chosen": -334.1960144042969,
123
+ "logps/rejected": -295.05865478515625,
124
+ "loss": 0.7096,
125
+ "rewards/accuracies": 0.5375000238418579,
126
+ "rewards/chosen": -0.012429716996848583,
127
+ "rewards/margins": 0.03767075017094612,
128
+ "rewards/rejected": -0.05010046809911728,
129
+ "step": 70
130
+ },
131
+ {
132
+ "epoch": 0.16749542004710807,
133
+ "grad_norm": 49.10207941889299,
134
+ "learning_rate": 5.555555555555555e-07,
135
+ "logits/chosen": 0.019698064774274826,
136
+ "logits/rejected": -0.05375131219625473,
137
+ "logps/chosen": -324.42889404296875,
138
+ "logps/rejected": -256.3900451660156,
139
+ "loss": 0.7125,
140
+ "rewards/accuracies": 0.574999988079071,
141
+ "rewards/chosen": 0.0010342865716665983,
142
+ "rewards/margins": 0.04288307949900627,
143
+ "rewards/rejected": -0.0418488010764122,
144
+ "step": 80
145
+ },
146
+ {
147
+ "epoch": 0.1884323475529966,
148
+ "grad_norm": 48.35701594054669,
149
+ "learning_rate": 6.249999999999999e-07,
150
+ "logits/chosen": -0.18192127346992493,
151
+ "logits/rejected": -0.15880808234214783,
152
+ "logps/chosen": -282.22161865234375,
153
+ "logps/rejected": -251.54296875,
154
+ "loss": 0.6976,
155
+ "rewards/accuracies": 0.4937500059604645,
156
+ "rewards/chosen": -0.012317812070250511,
157
+ "rewards/margins": -0.011760599911212921,
158
+ "rewards/rejected": -0.0005572110530920327,
159
+ "step": 90
160
+ },
161
+ {
162
+ "epoch": 0.2093692750588851,
163
+ "grad_norm": 44.92592146146067,
164
+ "learning_rate": 6.944444444444444e-07,
165
+ "logits/chosen": -0.11696960031986237,
166
+ "logits/rejected": -0.13915769755840302,
167
+ "logps/chosen": -316.96527099609375,
168
+ "logps/rejected": -277.3512268066406,
169
+ "loss": 0.6787,
170
+ "rewards/accuracies": 0.5562499761581421,
171
+ "rewards/chosen": 0.03938128799200058,
172
+ "rewards/margins": 0.08465500175952911,
173
+ "rewards/rejected": -0.045273713767528534,
174
+ "step": 100
175
+ },
176
+ {
177
+ "epoch": 0.2093692750588851,
178
+ "eval_logits/chosen": -0.3141140341758728,
179
+ "eval_logits/rejected": -0.3377441465854645,
180
+ "eval_logps/chosen": -310.252685546875,
181
+ "eval_logps/rejected": -274.4240417480469,
182
+ "eval_loss": 0.6966869235038757,
183
+ "eval_rewards/accuracies": 0.5515872836112976,
184
+ "eval_rewards/chosen": 0.01587979681789875,
185
+ "eval_rewards/margins": 0.08612197637557983,
186
+ "eval_rewards/rejected": -0.07024218887090683,
187
+ "eval_runtime": 19.212,
188
+ "eval_samples_per_second": 104.102,
189
+ "eval_steps_per_second": 3.279,
190
+ "step": 100
191
+ },
192
+ {
193
+ "epoch": 0.23030620256477363,
194
+ "grad_norm": 42.34774239140926,
195
+ "learning_rate": 7.638888888888888e-07,
196
+ "logits/chosen": -0.15527260303497314,
197
+ "logits/rejected": -0.1998877376317978,
198
+ "logps/chosen": -317.2166748046875,
199
+ "logps/rejected": -267.8033447265625,
200
+ "loss": 0.6869,
201
+ "rewards/accuracies": 0.5562499761581421,
202
+ "rewards/chosen": -0.0031774670351296663,
203
+ "rewards/margins": 0.06427686661481857,
204
+ "rewards/rejected": -0.06745433807373047,
205
+ "step": 110
206
+ },
207
+ {
208
+ "epoch": 0.2512431300706621,
209
+ "grad_norm": 42.69543501672853,
210
+ "learning_rate": 8.333333333333333e-07,
211
+ "logits/chosen": -0.08257915079593658,
212
+ "logits/rejected": -0.12443940341472626,
213
+ "logps/chosen": -275.12579345703125,
214
+ "logps/rejected": -245.96347045898438,
215
+ "loss": 0.6706,
216
+ "rewards/accuracies": 0.5562499761581421,
217
+ "rewards/chosen": 0.050213612616062164,
218
+ "rewards/margins": 0.10280221700668335,
219
+ "rewards/rejected": -0.052588604390621185,
220
+ "step": 120
221
+ },
222
+ {
223
+ "epoch": 0.2721800575765506,
224
+ "grad_norm": 50.775507526530916,
225
+ "learning_rate": 9.027777777777778e-07,
226
+ "logits/chosen": -0.14645054936408997,
227
+ "logits/rejected": -0.2005746066570282,
228
+ "logps/chosen": -313.00238037109375,
229
+ "logps/rejected": -257.37945556640625,
230
+ "loss": 0.6662,
231
+ "rewards/accuracies": 0.6187499761581421,
232
+ "rewards/chosen": 0.12051638215780258,
233
+ "rewards/margins": 0.18545812368392944,
234
+ "rewards/rejected": -0.06494174152612686,
235
+ "step": 130
236
+ },
237
+ {
238
+ "epoch": 0.29311698508243916,
239
+ "grad_norm": 45.04510687958917,
240
+ "learning_rate": 9.722222222222222e-07,
241
+ "logits/chosen": -0.18603107333183289,
242
+ "logits/rejected": -0.14804694056510925,
243
+ "logps/chosen": -336.45330810546875,
244
+ "logps/rejected": -270.71844482421875,
245
+ "loss": 0.6837,
246
+ "rewards/accuracies": 0.574999988079071,
247
+ "rewards/chosen": 0.1061311587691307,
248
+ "rewards/margins": 0.20069436728954315,
249
+ "rewards/rejected": -0.09456320852041245,
250
+ "step": 140
251
+ },
252
+ {
253
+ "epoch": 0.31405391258832765,
254
+ "grad_norm": 47.07473020466515,
255
+ "learning_rate": 9.999463737538052e-07,
256
+ "logits/chosen": -0.09479381889104843,
257
+ "logits/rejected": -0.18823939561843872,
258
+ "logps/chosen": -300.7477111816406,
259
+ "logps/rejected": -261.5872497558594,
260
+ "loss": 0.6701,
261
+ "rewards/accuracies": 0.543749988079071,
262
+ "rewards/chosen": 0.014584893360733986,
263
+ "rewards/margins": 0.1319422572851181,
264
+ "rewards/rejected": -0.11735737323760986,
265
+ "step": 150
266
+ },
267
+ {
268
+ "epoch": 0.33499084009421615,
269
+ "grad_norm": 40.24946321953983,
270
+ "learning_rate": 9.996186994612174e-07,
271
+ "logits/chosen": -0.07166972011327744,
272
+ "logits/rejected": -0.05619664117693901,
273
+ "logps/chosen": -277.3063049316406,
274
+ "logps/rejected": -264.21929931640625,
275
+ "loss": 0.6571,
276
+ "rewards/accuracies": 0.6499999761581421,
277
+ "rewards/chosen": -0.002465812023729086,
278
+ "rewards/margins": 0.21548476815223694,
279
+ "rewards/rejected": -0.21795055270195007,
280
+ "step": 160
281
+ },
282
+ {
283
+ "epoch": 0.3559277676001047,
284
+ "grad_norm": 43.25422484420196,
285
+ "learning_rate": 9.989933382359422e-07,
286
+ "logits/chosen": -0.22567839920520782,
287
+ "logits/rejected": -0.2581842541694641,
288
+ "logps/chosen": -270.6056213378906,
289
+ "logps/rejected": -234.0198211669922,
290
+ "loss": 0.6614,
291
+ "rewards/accuracies": 0.6499999761581421,
292
+ "rewards/chosen": -0.0034525603987276554,
293
+ "rewards/margins": 0.24006681144237518,
294
+ "rewards/rejected": -0.2435193508863449,
295
+ "step": 170
296
+ },
297
+ {
298
+ "epoch": 0.3768646951059932,
299
+ "grad_norm": 39.47362541006732,
300
+ "learning_rate": 9.980706626858607e-07,
301
+ "logits/chosen": -0.15520626306533813,
302
+ "logits/rejected": -0.13547591865062714,
303
+ "logps/chosen": -275.5711364746094,
304
+ "logps/rejected": -280.95904541015625,
305
+ "loss": 0.6323,
306
+ "rewards/accuracies": 0.699999988079071,
307
+ "rewards/chosen": 0.01858050376176834,
308
+ "rewards/margins": 0.3474423885345459,
309
+ "rewards/rejected": -0.32886189222335815,
310
+ "step": 180
311
+ },
312
+ {
313
+ "epoch": 0.39780162261188173,
314
+ "grad_norm": 45.3164057858113,
315
+ "learning_rate": 9.968512225671258e-07,
316
+ "logits/chosen": -0.07492430508136749,
317
+ "logits/rejected": -0.07894851267337799,
318
+ "logps/chosen": -287.3063659667969,
319
+ "logps/rejected": -286.6043395996094,
320
+ "loss": 0.649,
321
+ "rewards/accuracies": 0.612500011920929,
322
+ "rewards/chosen": -0.07111965119838715,
323
+ "rewards/margins": 0.22652164101600647,
324
+ "rewards/rejected": -0.2976413071155548,
325
+ "step": 190
326
+ },
327
+ {
328
+ "epoch": 0.4187385501177702,
329
+ "grad_norm": 51.9773301288437,
330
+ "learning_rate": 9.953357444566038e-07,
331
+ "logits/chosen": -0.09818016737699509,
332
+ "logits/rejected": -0.11751595884561539,
333
+ "logps/chosen": -284.3682861328125,
334
+ "logps/rejected": -264.2423095703125,
335
+ "loss": 0.645,
336
+ "rewards/accuracies": 0.5687500238418579,
337
+ "rewards/chosen": -0.06741360574960709,
338
+ "rewards/margins": 0.13478204607963562,
339
+ "rewards/rejected": -0.2021956443786621,
340
+ "step": 200
341
+ },
342
+ {
343
+ "epoch": 0.4187385501177702,
344
+ "eval_logits/chosen": -0.3229367136955261,
345
+ "eval_logits/rejected": -0.34633249044418335,
346
+ "eval_logps/chosen": -310.3840026855469,
347
+ "eval_logps/rejected": -274.8875732421875,
348
+ "eval_loss": 0.6491106748580933,
349
+ "eval_rewards/accuracies": 0.60317462682724,
350
+ "eval_rewards/chosen": -0.04975215345621109,
351
+ "eval_rewards/margins": 0.25227656960487366,
352
+ "eval_rewards/rejected": -0.30202868580818176,
353
+ "eval_runtime": 19.4848,
354
+ "eval_samples_per_second": 102.644,
355
+ "eval_steps_per_second": 3.233,
356
+ "step": 200
357
+ },
358
+ {
359
+ "epoch": 0.4396754776236587,
360
+ "grad_norm": 42.51852593559685,
361
+ "learning_rate": 9.935251313189563e-07,
362
+ "logits/chosen": 0.02242279052734375,
363
+ "logits/rejected": -0.06930123269557953,
364
+ "logps/chosen": -321.75689697265625,
365
+ "logps/rejected": -269.69781494140625,
366
+ "loss": 0.6528,
367
+ "rewards/accuracies": 0.612500011920929,
368
+ "rewards/chosen": -0.06879940629005432,
369
+ "rewards/margins": 0.2159956693649292,
370
+ "rewards/rejected": -0.2847950756549835,
371
+ "step": 210
372
+ },
373
+ {
374
+ "epoch": 0.46061240512954726,
375
+ "grad_norm": 39.56904653144407,
376
+ "learning_rate": 9.914204619686312e-07,
377
+ "logits/chosen": -0.21375274658203125,
378
+ "logits/rejected": -0.06523901224136353,
379
+ "logps/chosen": -263.26251220703125,
380
+ "logps/rejected": -254.17361450195312,
381
+ "loss": 0.6473,
382
+ "rewards/accuracies": 0.574999988079071,
383
+ "rewards/chosen": -0.09464406222105026,
384
+ "rewards/margins": 0.11034605652093887,
385
+ "rewards/rejected": -0.20499010384082794,
386
+ "step": 220
387
+ },
388
+ {
389
+ "epoch": 0.48154933263543576,
390
+ "grad_norm": 49.49502162405857,
391
+ "learning_rate": 9.89022990427073e-07,
392
+ "logits/chosen": -0.1386515200138092,
393
+ "logits/rejected": -0.027841920033097267,
394
+ "logps/chosen": -275.6785583496094,
395
+ "logps/rejected": -282.2340393066406,
396
+ "loss": 0.651,
397
+ "rewards/accuracies": 0.6875,
398
+ "rewards/chosen": -0.0026134043000638485,
399
+ "rewards/margins": 0.34593018889427185,
400
+ "rewards/rejected": -0.34854358434677124,
401
+ "step": 230
402
+ },
403
+ {
404
+ "epoch": 0.5024862601413242,
405
+ "grad_norm": 42.40141104653558,
406
+ "learning_rate": 9.86334145175542e-07,
407
+ "logits/chosen": -0.16665732860565186,
408
+ "logits/rejected": -0.13909348845481873,
409
+ "logps/chosen": -282.2431640625,
410
+ "logps/rejected": -265.6712341308594,
411
+ "loss": 0.6269,
412
+ "rewards/accuracies": 0.5562499761581421,
413
+ "rewards/chosen": -0.03417225927114487,
414
+ "rewards/margins": 0.2094908207654953,
415
+ "rewards/rejected": -0.24366307258605957,
416
+ "step": 240
417
+ },
418
+ {
419
+ "epoch": 0.5234231876472127,
420
+ "grad_norm": 44.711963555505086,
421
+ "learning_rate": 9.83355528303984e-07,
422
+ "logits/chosen": -0.21623122692108154,
423
+ "logits/rejected": -0.25620579719543457,
424
+ "logps/chosen": -321.9479064941406,
425
+ "logps/rejected": -276.6851501464844,
426
+ "loss": 0.6483,
427
+ "rewards/accuracies": 0.59375,
428
+ "rewards/chosen": -0.033338677138090134,
429
+ "rewards/margins": 0.21741199493408203,
430
+ "rewards/rejected": -0.25075066089630127,
431
+ "step": 250
432
+ },
433
+ {
434
+ "epoch": 0.5443601151531012,
435
+ "grad_norm": 44.08715279117413,
436
+ "learning_rate": 9.800889145564616e-07,
437
+ "logits/chosen": -0.023601394146680832,
438
+ "logits/rejected": -0.08905109018087387,
439
+ "logps/chosen": -307.87286376953125,
440
+ "logps/rejected": -257.384765625,
441
+ "loss": 0.6489,
442
+ "rewards/accuracies": 0.6000000238418579,
443
+ "rewards/chosen": -0.03719106689095497,
444
+ "rewards/margins": 0.2362823486328125,
445
+ "rewards/rejected": -0.2734734117984772,
446
+ "step": 260
447
+ },
448
+ {
449
+ "epoch": 0.5652970426589898,
450
+ "grad_norm": 39.7549215734126,
451
+ "learning_rate": 9.765362502737097e-07,
452
+ "logits/chosen": -0.15661786496639252,
453
+ "logits/rejected": -0.1753096580505371,
454
+ "logps/chosen": -313.56787109375,
455
+ "logps/rejected": -277.8585510253906,
456
+ "loss": 0.6233,
457
+ "rewards/accuracies": 0.637499988079071,
458
+ "rewards/chosen": 0.07719334214925766,
459
+ "rewards/margins": 0.3890419602394104,
460
+ "rewards/rejected": -0.31184864044189453,
461
+ "step": 270
462
+ },
463
+ {
464
+ "epoch": 0.5862339701648783,
465
+ "grad_norm": 46.953763281596736,
466
+ "learning_rate": 9.726996522334514e-07,
467
+ "logits/chosen": -0.22773197293281555,
468
+ "logits/rejected": -0.25082847476005554,
469
+ "logps/chosen": -299.99127197265625,
470
+ "logps/rejected": -249.6244659423828,
471
+ "loss": 0.6409,
472
+ "rewards/accuracies": 0.6312500238418579,
473
+ "rewards/chosen": -0.044135063886642456,
474
+ "rewards/margins": 0.2800499200820923,
475
+ "rewards/rejected": -0.32418501377105713,
476
+ "step": 280
477
+ },
478
+ {
479
+ "epoch": 0.6071708976707668,
480
+ "grad_norm": 38.382090654521065,
481
+ "learning_rate": 9.68581406389163e-07,
482
+ "logits/chosen": -0.14438043534755707,
483
+ "logits/rejected": -0.17269106209278107,
484
+ "logps/chosen": -282.90325927734375,
485
+ "logps/rejected": -274.38079833984375,
486
+ "loss": 0.6217,
487
+ "rewards/accuracies": 0.643750011920929,
488
+ "rewards/chosen": -0.009734408929944038,
489
+ "rewards/margins": 0.3537730574607849,
490
+ "rewards/rejected": -0.363507479429245,
491
+ "step": 290
492
+ },
493
+ {
494
+ "epoch": 0.6281078251766553,
495
+ "grad_norm": 41.7647459812509,
496
+ "learning_rate": 9.641839665080363e-07,
497
+ "logits/chosen": -0.1018938273191452,
498
+ "logits/rejected": -0.08353041112422943,
499
+ "logps/chosen": -316.49810791015625,
500
+ "logps/rejected": -277.4931640625,
501
+ "loss": 0.6161,
502
+ "rewards/accuracies": 0.6937500238418579,
503
+ "rewards/chosen": 0.03871753811836243,
504
+ "rewards/margins": 0.39340126514434814,
505
+ "rewards/rejected": -0.3546837270259857,
506
+ "step": 300
507
+ },
508
+ {
509
+ "epoch": 0.6281078251766553,
510
+ "eval_logits/chosen": -0.33166003227233887,
511
+ "eval_logits/rejected": -0.3552355170249939,
512
+ "eval_logps/chosen": -310.4118957519531,
513
+ "eval_logps/rejected": -275.1272277832031,
514
+ "eval_loss": 0.6316225528717041,
515
+ "eval_rewards/accuracies": 0.682539701461792,
516
+ "eval_rewards/chosen": -0.06370855122804642,
517
+ "eval_rewards/margins": 0.3581322431564331,
518
+ "eval_rewards/rejected": -0.4218408465385437,
519
+ "eval_runtime": 19.0833,
520
+ "eval_samples_per_second": 104.804,
521
+ "eval_steps_per_second": 3.301,
522
+ "step": 300
523
+ },
524
+ {
525
+ "epoch": 0.6490447526825438,
526
+ "grad_norm": 40.9716290999305,
527
+ "learning_rate": 9.595099527089568e-07,
528
+ "logits/chosen": -0.11042879521846771,
529
+ "logits/rejected": -0.18008281290531158,
530
+ "logps/chosen": -323.7516174316406,
531
+ "logps/rejected": -247.0631561279297,
532
+ "loss": 0.6007,
533
+ "rewards/accuracies": 0.7124999761581421,
534
+ "rewards/chosen": 0.10534234344959259,
535
+ "rewards/margins": 0.5895902514457703,
536
+ "rewards/rejected": -0.48424798250198364,
537
+ "step": 310
538
+ },
539
+ {
540
+ "epoch": 0.6699816801884323,
541
+ "grad_norm": 45.14020951986602,
542
+ "learning_rate": 9.545621499013618e-07,
543
+ "logits/chosen": -0.024479269981384277,
544
+ "logits/rejected": -0.038955364376306534,
545
+ "logps/chosen": -294.4266662597656,
546
+ "logps/rejected": -252.0673828125,
547
+ "loss": 0.5995,
548
+ "rewards/accuracies": 0.668749988079071,
549
+ "rewards/chosen": 0.06134527176618576,
550
+ "rewards/margins": 0.4816582202911377,
551
+ "rewards/rejected": -0.4203129708766937,
552
+ "step": 320
553
+ },
554
+ {
555
+ "epoch": 0.6909186076943209,
556
+ "grad_norm": 38.030813091938484,
557
+ "learning_rate": 9.493435061259129e-07,
558
+ "logits/chosen": -0.012772688642144203,
559
+ "logits/rejected": -0.030413877218961716,
560
+ "logps/chosen": -291.36761474609375,
561
+ "logps/rejected": -265.4248046875,
562
+ "loss": 0.6232,
563
+ "rewards/accuracies": 0.6187499761581421,
564
+ "rewards/chosen": 0.001638062298297882,
565
+ "rewards/margins": 0.40076717734336853,
566
+ "rewards/rejected": -0.39912909269332886,
567
+ "step": 330
568
+ },
569
+ {
570
+ "epoch": 0.7118555352002094,
571
+ "grad_norm": 45.93570520280886,
572
+ "learning_rate": 9.438571307979704e-07,
573
+ "logits/chosen": -0.04617694020271301,
574
+ "logits/rejected": -0.07658630609512329,
575
+ "logps/chosen": -296.7699890136719,
576
+ "logps/rejected": -270.9905090332031,
577
+ "loss": 0.6096,
578
+ "rewards/accuracies": 0.6812499761581421,
579
+ "rewards/chosen": -0.05764853209257126,
580
+ "rewards/margins": 0.28755250573158264,
581
+ "rewards/rejected": -0.34520095586776733,
582
+ "step": 340
583
+ },
584
+ {
585
+ "epoch": 0.7327924627060979,
586
+ "grad_norm": 44.09128545700975,
587
+ "learning_rate": 9.381062928549151e-07,
588
+ "logits/chosen": -0.12188152968883514,
589
+ "logits/rejected": -0.13376249372959137,
590
+ "logps/chosen": -282.15179443359375,
591
+ "logps/rejected": -247.45834350585938,
592
+ "loss": 0.5964,
593
+ "rewards/accuracies": 0.6187499761581421,
594
+ "rewards/chosen": -0.09005177021026611,
595
+ "rewards/margins": 0.41418519616127014,
596
+ "rewards/rejected": -0.5042369961738586,
597
+ "step": 350
598
+ },
599
+ {
600
+ "epoch": 0.7537293902119864,
601
+ "grad_norm": 40.36784058818516,
602
+ "learning_rate": 9.320944188084241e-07,
603
+ "logits/chosen": -0.07130910456180573,
604
+ "logits/rejected": -0.18685731291770935,
605
+ "logps/chosen": -340.55487060546875,
606
+ "logps/rejected": -278.08551025390625,
607
+ "loss": 0.6256,
608
+ "rewards/accuracies": 0.65625,
609
+ "rewards/chosen": 0.1223798543214798,
610
+ "rewards/margins": 0.4791669249534607,
611
+ "rewards/rejected": -0.3567870557308197,
612
+ "step": 360
613
+ },
614
+ {
615
+ "epoch": 0.7746663177178749,
616
+ "grad_norm": 37.74697617657793,
617
+ "learning_rate": 9.258250907028572e-07,
618
+ "logits/chosen": -0.2023000717163086,
619
+ "logits/rejected": -0.25707709789276123,
620
+ "logps/chosen": -306.526611328125,
621
+ "logps/rejected": -248.40023803710938,
622
+ "loss": 0.603,
623
+ "rewards/accuracies": 0.6625000238418579,
624
+ "rewards/chosen": 0.05354640632867813,
625
+ "rewards/margins": 0.42885661125183105,
626
+ "rewards/rejected": -0.3753102421760559,
627
+ "step": 370
628
+ },
629
+ {
630
+ "epoch": 0.7956032452237635,
631
+ "grad_norm": 36.818935480552355,
632
+ "learning_rate": 9.193020439809746e-07,
633
+ "logits/chosen": -0.25006169080734253,
634
+ "logits/rejected": -0.27239733934402466,
635
+ "logps/chosen": -306.9652099609375,
636
+ "logps/rejected": -274.15850830078125,
637
+ "loss": 0.6108,
638
+ "rewards/accuracies": 0.6937500238418579,
639
+ "rewards/chosen": -0.005802587606012821,
640
+ "rewards/margins": 0.5873938798904419,
641
+ "rewards/rejected": -0.5931965112686157,
642
+ "step": 380
643
+ },
644
+ {
645
+ "epoch": 0.816540172729652,
646
+ "grad_norm": 40.218709775991016,
647
+ "learning_rate": 9.125291652582547e-07,
648
+ "logits/chosen": -0.19798080623149872,
649
+ "logits/rejected": -0.16009968519210815,
650
+ "logps/chosen": -308.05975341796875,
651
+ "logps/rejected": -289.6929626464844,
652
+ "loss": 0.6163,
653
+ "rewards/accuracies": 0.6937500238418579,
654
+ "rewards/chosen": 0.09920313209295273,
655
+ "rewards/margins": 0.47261109948158264,
656
+ "rewards/rejected": -0.3734079897403717,
657
+ "step": 390
658
+ },
659
+ {
660
+ "epoch": 0.8374771002355405,
661
+ "grad_norm": 39.13573330990029,
662
+ "learning_rate": 9.055104900071375e-07,
663
+ "logits/chosen": -0.3008892834186554,
664
+ "logits/rejected": -0.1968032568693161,
665
+ "logps/chosen": -253.58798217773438,
666
+ "logps/rejected": -239.539794921875,
667
+ "loss": 0.5964,
668
+ "rewards/accuracies": 0.78125,
669
+ "rewards/chosen": 0.14220505952835083,
670
+ "rewards/margins": 0.6365345120429993,
671
+ "rewards/rejected": -0.49432945251464844,
672
+ "step": 400
673
+ },
674
+ {
675
+ "epoch": 0.8374771002355405,
676
+ "eval_logits/chosen": -0.3291381299495697,
677
+ "eval_logits/rejected": -0.35453417897224426,
678
+ "eval_logps/chosen": -310.317626953125,
679
+ "eval_logps/rejected": -275.15972900390625,
680
+ "eval_loss": 0.6100037693977356,
681
+ "eval_rewards/accuracies": 0.658730149269104,
682
+ "eval_rewards/chosen": -0.016582150012254715,
683
+ "eval_rewards/margins": 0.42150571942329407,
684
+ "eval_rewards/rejected": -0.43808794021606445,
685
+ "eval_runtime": 19.1374,
686
+ "eval_samples_per_second": 104.507,
687
+ "eval_steps_per_second": 3.292,
688
+ "step": 400
689
+ },
690
+ {
691
+ "epoch": 0.8584140277414289,
692
+ "grad_norm": 41.59836395581152,
693
+ "learning_rate": 8.982502001525777e-07,
694
+ "logits/chosen": -0.05211949348449707,
695
+ "logits/rejected": -0.16379515826702118,
696
+ "logps/chosen": -334.3597717285156,
697
+ "logps/rejected": -280.1282043457031,
698
+ "loss": 0.6085,
699
+ "rewards/accuracies": 0.706250011920929,
700
+ "rewards/chosen": 0.10034932941198349,
701
+ "rewards/margins": 0.5413358211517334,
702
+ "rewards/rejected": -0.4409864842891693,
703
+ "step": 410
704
+ },
705
+ {
706
+ "epoch": 0.8793509552473174,
707
+ "grad_norm": 43.48433949710367,
708
+ "learning_rate": 8.90752621580335e-07,
709
+ "logits/chosen": -0.10794836282730103,
710
+ "logits/rejected": -0.10175999253988266,
711
+ "logps/chosen": -289.7253723144531,
712
+ "logps/rejected": -290.4524841308594,
713
+ "loss": 0.5967,
714
+ "rewards/accuracies": 0.737500011920929,
715
+ "rewards/chosen": 0.03317990154027939,
716
+ "rewards/margins": 0.5273382663726807,
717
+ "rewards/rejected": -0.4941583275794983,
718
+ "step": 420
719
+ },
720
+ {
721
+ "epoch": 0.9002878827532059,
722
+ "grad_norm": 38.200809011500716,
723
+ "learning_rate": 8.83022221559489e-07,
724
+ "logits/chosen": -0.08915697038173676,
725
+ "logits/rejected": -0.18852075934410095,
726
+ "logps/chosen": -325.81707763671875,
727
+ "logps/rejected": -257.1981506347656,
728
+ "loss": 0.5916,
729
+ "rewards/accuracies": 0.637499988079071,
730
+ "rewards/chosen": 0.053056418895721436,
731
+ "rewards/margins": 0.5633317232131958,
732
+ "rewards/rejected": -0.5102753043174744,
733
+ "step": 430
734
+ },
735
+ {
736
+ "epoch": 0.9212248102590945,
737
+ "grad_norm": 37.18511688193762,
738
+ "learning_rate": 8.750636060807145e-07,
739
+ "logits/chosen": -0.0535130575299263,
740
+ "logits/rejected": -0.1133667603135109,
741
+ "logps/chosen": -299.4546813964844,
742
+ "logps/rejected": -248.877685546875,
743
+ "loss": 0.6032,
744
+ "rewards/accuracies": 0.6312500238418579,
745
+ "rewards/chosen": 0.14839240908622742,
746
+ "rewards/margins": 0.4819253385066986,
747
+ "rewards/rejected": -0.3335329294204712,
748
+ "step": 440
749
+ },
750
+ {
751
+ "epoch": 0.942161737764983,
752
+ "grad_norm": 35.5219084800687,
753
+ "learning_rate": 8.668815171119019e-07,
754
+ "logits/chosen": -0.16904090344905853,
755
+ "logits/rejected": -0.12047908455133438,
756
+ "logps/chosen": -297.3148498535156,
757
+ "logps/rejected": -285.4595031738281,
758
+ "loss": 0.6148,
759
+ "rewards/accuracies": 0.6937500238418579,
760
+ "rewards/chosen": 0.051097553223371506,
761
+ "rewards/margins": 0.505901038646698,
762
+ "rewards/rejected": -0.4548035264015198,
763
+ "step": 450
764
+ },
765
+ {
766
+ "epoch": 0.9630986652708715,
767
+ "grad_norm": 40.701223424626065,
768
+ "learning_rate": 8.584808297727591e-07,
769
+ "logits/chosen": -0.07098624855279922,
770
+ "logits/rejected": -0.11652670055627823,
771
+ "logps/chosen": -291.76434326171875,
772
+ "logps/rejected": -246.9138641357422,
773
+ "loss": 0.6087,
774
+ "rewards/accuracies": 0.643750011920929,
775
+ "rewards/chosen": 0.10639622062444687,
776
+ "rewards/margins": 0.5133938193321228,
777
+ "rewards/rejected": -0.40699753165245056,
778
+ "step": 460
779
+ },
780
+ {
781
+ "epoch": 0.98403559277676,
782
+ "grad_norm": 41.789096950373676,
783
+ "learning_rate": 8.498665494300771e-07,
784
+ "logits/chosen": -0.19742052257061005,
785
+ "logits/rejected": -0.16556285321712494,
786
+ "logps/chosen": -302.3099060058594,
787
+ "logps/rejected": -277.933837890625,
788
+ "loss": 0.5919,
789
+ "rewards/accuracies": 0.625,
790
+ "rewards/chosen": 0.026506727561354637,
791
+ "rewards/margins": 0.43121033906936646,
792
+ "rewards/rejected": -0.40470361709594727,
793
+ "step": 470
794
+ },
795
+ {
796
+ "epoch": 1.0049725202826485,
797
+ "grad_norm": 34.0036850645019,
798
+ "learning_rate": 8.410438087153911e-07,
799
+ "logits/chosen": -0.05161554738879204,
800
+ "logits/rejected": -0.1036214604973793,
801
+ "logps/chosen": -300.4537658691406,
802
+ "logps/rejected": -268.56048583984375,
803
+ "loss": 0.5867,
804
+ "rewards/accuracies": 0.6625000238418579,
805
+ "rewards/chosen": -0.00218582758679986,
806
+ "rewards/margins": 0.4712887704372406,
807
+ "rewards/rejected": -0.4734745919704437,
808
+ "step": 480
809
+ },
810
+ {
811
+ "epoch": 1.025909447788537,
812
+ "grad_norm": 36.00706898235858,
813
+ "learning_rate": 8.320178644668141e-07,
814
+ "logits/chosen": -0.11621465533971786,
815
+ "logits/rejected": -0.18176773190498352,
816
+ "logps/chosen": -299.8227233886719,
817
+ "logps/rejected": -256.3122253417969,
818
+ "loss": 0.54,
819
+ "rewards/accuracies": 0.7562500238418579,
820
+ "rewards/chosen": 0.12694445252418518,
821
+ "rewards/margins": 0.600742757320404,
822
+ "rewards/rejected": -0.47379836440086365,
823
+ "step": 490
824
+ },
825
+ {
826
+ "epoch": 1.0468463752944255,
827
+ "grad_norm": 36.99527595331141,
828
+ "learning_rate": 8.22794094596864e-07,
829
+ "logits/chosen": -0.1485815942287445,
830
+ "logits/rejected": -0.20506855845451355,
831
+ "logps/chosen": -305.8435363769531,
832
+ "logps/rejected": -270.3700256347656,
833
+ "loss": 0.5394,
834
+ "rewards/accuracies": 0.7124999761581421,
835
+ "rewards/chosen": 0.029846886172890663,
836
+ "rewards/margins": 0.6080519556999207,
837
+ "rewards/rejected": -0.5782050490379333,
838
+ "step": 500
839
+ },
840
+ {
841
+ "epoch": 1.0468463752944255,
842
+ "eval_logits/chosen": -0.33202826976776123,
843
+ "eval_logits/rejected": -0.35762789845466614,
844
+ "eval_logps/chosen": -310.303955078125,
845
+ "eval_logps/rejected": -275.2332458496094,
846
+ "eval_loss": 0.6065964102745056,
847
+ "eval_rewards/accuracies": 0.7103174328804016,
848
+ "eval_rewards/chosen": -0.009755274280905724,
849
+ "eval_rewards/margins": 0.465115487575531,
850
+ "eval_rewards/rejected": -0.4748707413673401,
851
+ "eval_runtime": 18.9724,
852
+ "eval_samples_per_second": 105.416,
853
+ "eval_steps_per_second": 3.321,
854
+ "step": 500
855
+ },
856
+ {
857
+ "epoch": 1.067783302800314,
858
+ "grad_norm": 33.8725028749791,
859
+ "learning_rate": 8.133779948881513e-07,
860
+ "logits/chosen": -0.18220725655555725,
861
+ "logits/rejected": -0.1532759964466095,
862
+ "logps/chosen": -269.54156494140625,
863
+ "logps/rejected": -259.2676086425781,
864
+ "loss": 0.533,
865
+ "rewards/accuracies": 0.78125,
866
+ "rewards/chosen": 0.12358293682336807,
867
+ "rewards/margins": 0.7069646120071411,
868
+ "rewards/rejected": -0.5833816528320312,
869
+ "step": 510
870
+ },
871
+ {
872
+ "epoch": 1.0887202303062025,
873
+ "grad_norm": 32.98315936865049,
874
+ "learning_rate": 8.037751757188367e-07,
875
+ "logits/chosen": -0.2609891891479492,
876
+ "logits/rejected": -0.1464134156703949,
877
+ "logps/chosen": -262.7670593261719,
878
+ "logps/rejected": -260.43121337890625,
879
+ "loss": 0.5216,
880
+ "rewards/accuracies": 0.7437499761581421,
881
+ "rewards/chosen": 0.10809774696826935,
882
+ "rewards/margins": 0.6094505786895752,
883
+ "rewards/rejected": -0.501352846622467,
884
+ "step": 520
885
+ },
886
+ {
887
+ "epoch": 1.109657157812091,
888
+ "grad_norm": 36.61105396566652,
889
+ "learning_rate": 7.939913587198095e-07,
890
+ "logits/chosen": -0.2482234686613083,
891
+ "logits/rejected": -0.3337472081184387,
892
+ "logps/chosen": -288.12847900390625,
893
+ "logps/rejected": -249.4586181640625,
894
+ "loss": 0.5439,
895
+ "rewards/accuracies": 0.71875,
896
+ "rewards/chosen": 0.05145542696118355,
897
+ "rewards/margins": 0.7037261128425598,
898
+ "rewards/rejected": -0.6522706747055054,
899
+ "step": 530
900
+ },
901
+ {
902
+ "epoch": 1.1305940853179797,
903
+ "grad_norm": 38.97199731536176,
904
+ "learning_rate": 7.840323733655778e-07,
905
+ "logits/chosen": -0.19931235909461975,
906
+ "logits/rejected": -0.09966816008090973,
907
+ "logps/chosen": -259.0215759277344,
908
+ "logps/rejected": -237.9501495361328,
909
+ "loss": 0.5534,
910
+ "rewards/accuracies": 0.731249988079071,
911
+ "rewards/chosen": 0.12085232883691788,
912
+ "rewards/margins": 0.6305907964706421,
913
+ "rewards/rejected": -0.5097384452819824,
914
+ "step": 540
915
+ },
916
+ {
917
+ "epoch": 1.151531012823868,
918
+ "grad_norm": 35.71159175636079,
919
+ "learning_rate": 7.739041535009041e-07,
920
+ "logits/chosen": -0.14787928760051727,
921
+ "logits/rejected": -0.24329273402690887,
922
+ "logps/chosen": -305.64697265625,
923
+ "logps/rejected": -244.08755493164062,
924
+ "loss": 0.5301,
925
+ "rewards/accuracies": 0.706250011920929,
926
+ "rewards/chosen": 0.08590878546237946,
927
+ "rewards/margins": 0.6826232075691223,
928
+ "rewards/rejected": -0.596714437007904,
929
+ "step": 550
930
+ },
931
+ {
932
+ "epoch": 1.1724679403297567,
933
+ "grad_norm": 40.90404154922072,
934
+ "learning_rate": 7.636127338052511e-07,
935
+ "logits/chosen": -0.10747908055782318,
936
+ "logits/rejected": -0.11999531090259552,
937
+ "logps/chosen": -294.2823486328125,
938
+ "logps/rejected": -265.80010986328125,
939
+ "loss": 0.5442,
940
+ "rewards/accuracies": 0.7562500238418579,
941
+ "rewards/chosen": 0.1678900569677353,
942
+ "rewards/margins": 0.6707212924957275,
943
+ "rewards/rejected": -0.502831220626831,
944
+ "step": 560
945
+ },
946
+ {
947
+ "epoch": 1.193404867835645,
948
+ "grad_norm": 36.957100723172736,
949
+ "learning_rate": 7.531642461971514e-07,
950
+ "logits/chosen": -0.24748548865318298,
951
+ "logits/rejected": -0.13145090639591217,
952
+ "logps/chosen": -270.5980529785156,
953
+ "logps/rejected": -276.2918395996094,
954
+ "loss": 0.5296,
955
+ "rewards/accuracies": 0.71875,
956
+ "rewards/chosen": 0.1355637013912201,
957
+ "rewards/margins": 0.7545391917228699,
958
+ "rewards/rejected": -0.6189755797386169,
959
+ "step": 570
960
+ },
961
+ {
962
+ "epoch": 1.2143417953415336,
963
+ "grad_norm": 36.431278984426584,
964
+ "learning_rate": 7.425649161806352e-07,
965
+ "logits/chosen": -0.22467997670173645,
966
+ "logits/rejected": -0.20354416966438293,
967
+ "logps/chosen": -284.69439697265625,
968
+ "logps/rejected": -250.2189483642578,
969
+ "loss": 0.5209,
970
+ "rewards/accuracies": 0.7562500238418579,
971
+ "rewards/chosen": 0.03756130859255791,
972
+ "rewards/margins": 0.7149747610092163,
973
+ "rewards/rejected": -0.6774134635925293,
974
+ "step": 580
975
+ },
976
+ {
977
+ "epoch": 1.235278722847422,
978
+ "grad_norm": 34.3971566508403,
979
+ "learning_rate": 7.318210591359008e-07,
980
+ "logits/chosen": -0.02497723139822483,
981
+ "logits/rejected": -0.0003952041151933372,
982
+ "logps/chosen": -316.3713684082031,
983
+ "logps/rejected": -274.1322937011719,
984
+ "loss": 0.5209,
985
+ "rewards/accuracies": 0.768750011920929,
986
+ "rewards/chosen": 0.07891981303691864,
987
+ "rewards/margins": 0.7175663113594055,
988
+ "rewards/rejected": -0.6386464834213257,
989
+ "step": 590
990
+ },
991
+ {
992
+ "epoch": 1.2562156503533106,
993
+ "grad_norm": 37.248359587110805,
994
+ "learning_rate": 7.209390765564318e-07,
995
+ "logits/chosen": -0.15565916895866394,
996
+ "logits/rejected": -0.10866830497980118,
997
+ "logps/chosen": -306.9658203125,
998
+ "logps/rejected": -306.49078369140625,
999
+ "loss": 0.5099,
1000
+ "rewards/accuracies": 0.706250011920929,
1001
+ "rewards/chosen": 0.1327342391014099,
1002
+ "rewards/margins": 0.7248380780220032,
1003
+ "rewards/rejected": -0.5921037197113037,
1004
+ "step": 600
1005
+ },
1006
+ {
1007
+ "epoch": 1.2562156503533106,
1008
+ "eval_logits/chosen": -0.33804643154144287,
1009
+ "eval_logits/rejected": -0.3634737432003021,
1010
+ "eval_logps/chosen": -310.3228759765625,
1011
+ "eval_logps/rejected": -275.3493347167969,
1012
+ "eval_loss": 0.6006649136543274,
1013
+ "eval_rewards/accuracies": 0.6785714030265808,
1014
+ "eval_rewards/chosen": -0.019210144877433777,
1015
+ "eval_rewards/margins": 0.5136736631393433,
1016
+ "eval_rewards/rejected": -0.5328837633132935,
1017
+ "eval_runtime": 19.1325,
1018
+ "eval_samples_per_second": 104.534,
1019
+ "eval_steps_per_second": 3.293,
1020
+ "step": 600
1021
+ },
1022
+ {
1023
+ "epoch": 1.2771525778591992,
1024
+ "grad_norm": 39.48209474577774,
1025
+ "learning_rate": 7.099254522348064e-07,
1026
+ "logits/chosen": -0.09819002449512482,
1027
+ "logits/rejected": -0.1692955642938614,
1028
+ "logps/chosen": -290.13140869140625,
1029
+ "logps/rejected": -237.3570098876953,
1030
+ "loss": 0.5202,
1031
+ "rewards/accuracies": 0.7749999761581421,
1032
+ "rewards/chosen": 0.1899166852235794,
1033
+ "rewards/margins": 0.8487440347671509,
1034
+ "rewards/rejected": -0.6588274240493774,
1035
+ "step": 610
1036
+ },
1037
+ {
1038
+ "epoch": 1.2980895053650876,
1039
+ "grad_norm": 35.38064058006504,
1040
+ "learning_rate": 6.987867483994716e-07,
1041
+ "logits/chosen": 0.08088377118110657,
1042
+ "logits/rejected": -0.028685202822089195,
1043
+ "logps/chosen": -286.01324462890625,
1044
+ "logps/rejected": -242.1278076171875,
1045
+ "loss": 0.5113,
1046
+ "rewards/accuracies": 0.731249988079071,
1047
+ "rewards/chosen": 0.08664991706609726,
1048
+ "rewards/margins": 0.6891879439353943,
1049
+ "rewards/rejected": -0.6025381088256836,
1050
+ "step": 620
1051
+ },
1052
+ {
1053
+ "epoch": 1.3190264328709762,
1054
+ "grad_norm": 36.280579625575726,
1055
+ "learning_rate": 6.875296018047809e-07,
1056
+ "logits/chosen": -0.15825395286083221,
1057
+ "logits/rejected": -0.13115188479423523,
1058
+ "logps/chosen": -291.768798828125,
1059
+ "logps/rejected": -271.0327453613281,
1060
+ "loss": 0.5133,
1061
+ "rewards/accuracies": 0.706250011920929,
1062
+ "rewards/chosen": 0.25919127464294434,
1063
+ "rewards/margins": 0.8277707099914551,
1064
+ "rewards/rejected": -0.5685793161392212,
1065
+ "step": 630
1066
+ },
1067
+ {
1068
+ "epoch": 1.3399633603768648,
1069
+ "grad_norm": 32.57724385812019,
1070
+ "learning_rate": 6.761607197766296e-07,
1071
+ "logits/chosen": -0.21067364513874054,
1072
+ "logits/rejected": -0.15081565082073212,
1073
+ "logps/chosen": -272.00518798828125,
1074
+ "logps/rejected": -279.40478515625,
1075
+ "loss": 0.5105,
1076
+ "rewards/accuracies": 0.762499988079071,
1077
+ "rewards/chosen": 0.14864543080329895,
1078
+ "rewards/margins": 0.7687338590621948,
1079
+ "rewards/rejected": -0.6200884580612183,
1080
+ "step": 640
1081
+ },
1082
+ {
1083
+ "epoch": 1.3609002878827532,
1084
+ "grad_norm": 37.48616481446126,
1085
+ "learning_rate": 6.646868762160398e-07,
1086
+ "logits/chosen": -0.19943957030773163,
1087
+ "logits/rejected": -0.161566823720932,
1088
+ "logps/chosen": -268.23638916015625,
1089
+ "logps/rejected": -252.7967987060547,
1090
+ "loss": 0.5372,
1091
+ "rewards/accuracies": 0.699999988079071,
1092
+ "rewards/chosen": 0.008694097399711609,
1093
+ "rewards/margins": 0.623866617679596,
1094
+ "rewards/rejected": -0.6151725053787231,
1095
+ "step": 650
1096
+ },
1097
+ {
1098
+ "epoch": 1.3818372153886418,
1099
+ "grad_norm": 39.563737783736045,
1100
+ "learning_rate": 6.531149075630796e-07,
1101
+ "logits/chosen": -0.11065585911273956,
1102
+ "logits/rejected": -0.17729897797107697,
1103
+ "logps/chosen": -263.4093933105469,
1104
+ "logps/rejected": -251.0884246826172,
1105
+ "loss": 0.5323,
1106
+ "rewards/accuracies": 0.6875,
1107
+ "rewards/chosen": -0.026829296723008156,
1108
+ "rewards/margins": 0.4506412148475647,
1109
+ "rewards/rejected": -0.4774704873561859,
1110
+ "step": 660
1111
+ },
1112
+ {
1113
+ "epoch": 1.4027741428945302,
1114
+ "grad_norm": 33.46296192971815,
1115
+ "learning_rate": 6.414517087235185e-07,
1116
+ "logits/chosen": -0.20335140824317932,
1117
+ "logits/rejected": -0.2963979244232178,
1118
+ "logps/chosen": -293.1023864746094,
1119
+ "logps/rejected": -253.64291381835938,
1120
+ "loss": 0.5049,
1121
+ "rewards/accuracies": 0.71875,
1122
+ "rewards/chosen": 0.00017919539823196828,
1123
+ "rewards/margins": 0.5864929556846619,
1124
+ "rewards/rejected": -0.5863137245178223,
1125
+ "step": 670
1126
+ },
1127
+ {
1128
+ "epoch": 1.4237110704004188,
1129
+ "grad_norm": 40.03881951253335,
1130
+ "learning_rate": 6.297042289606479e-07,
1131
+ "logits/chosen": -0.13722732663154602,
1132
+ "logits/rejected": -0.1250244826078415,
1133
+ "logps/chosen": -295.9792785644531,
1134
+ "logps/rejected": -296.83038330078125,
1135
+ "loss": 0.5133,
1136
+ "rewards/accuracies": 0.6812499761581421,
1137
+ "rewards/chosen": -0.014182251878082752,
1138
+ "rewards/margins": 0.5892963409423828,
1139
+ "rewards/rejected": -0.6034785509109497,
1140
+ "step": 680
1141
+ },
1142
+ {
1143
+ "epoch": 1.4446479979063072,
1144
+ "grad_norm": 29.250271504061402,
1145
+ "learning_rate": 6.178794677547137e-07,
1146
+ "logits/chosen": -0.18004044890403748,
1147
+ "logits/rejected": -0.1459943801164627,
1148
+ "logps/chosen": -298.6792297363281,
1149
+ "logps/rejected": -257.5166931152344,
1150
+ "loss": 0.5157,
1151
+ "rewards/accuracies": 0.7124999761581421,
1152
+ "rewards/chosen": 0.07589882612228394,
1153
+ "rewards/margins": 0.7056849598884583,
1154
+ "rewards/rejected": -0.6297860741615295,
1155
+ "step": 690
1156
+ },
1157
+ {
1158
+ "epoch": 1.4655849254121958,
1159
+ "grad_norm": 32.49512416553128,
1160
+ "learning_rate": 6.059844706324286e-07,
1161
+ "logits/chosen": -0.07275749742984772,
1162
+ "logits/rejected": 0.005568481981754303,
1163
+ "logps/chosen": -314.0689697265625,
1164
+ "logps/rejected": -328.4625244140625,
1165
+ "loss": 0.5056,
1166
+ "rewards/accuracies": 0.762499988079071,
1167
+ "rewards/chosen": 0.13801348209381104,
1168
+ "rewards/margins": 0.876538872718811,
1169
+ "rewards/rejected": -0.738525390625,
1170
+ "step": 700
1171
+ },
1172
+ {
1173
+ "epoch": 1.4655849254121958,
1174
+ "eval_logits/chosen": -0.3406723737716675,
1175
+ "eval_logits/rejected": -0.36724287271499634,
1176
+ "eval_logps/chosen": -310.4104309082031,
1177
+ "eval_logps/rejected": -275.4717102050781,
1178
+ "eval_loss": 0.5875913500785828,
1179
+ "eval_rewards/accuracies": 0.6904761791229248,
1180
+ "eval_rewards/chosen": -0.06298798322677612,
1181
+ "eval_rewards/margins": 0.5310800671577454,
1182
+ "eval_rewards/rejected": -0.5940679311752319,
1183
+ "eval_runtime": 19.0211,
1184
+ "eval_samples_per_second": 105.146,
1185
+ "eval_steps_per_second": 3.312,
1186
+ "step": 700
1187
+ },
1188
+ {
1189
+ "epoch": 1.4865218529180844,
1190
+ "grad_norm": 35.75327607918745,
1191
+ "learning_rate": 5.940263249690477e-07,
1192
+ "logits/chosen": -0.24695155024528503,
1193
+ "logits/rejected": -0.22096967697143555,
1194
+ "logps/chosen": -295.2511901855469,
1195
+ "logps/rejected": -276.71051025390625,
1196
+ "loss": 0.5132,
1197
+ "rewards/accuracies": 0.731249988079071,
1198
+ "rewards/chosen": 0.058757662773132324,
1199
+ "rewards/margins": 0.7594215273857117,
1200
+ "rewards/rejected": -0.7006638646125793,
1201
+ "step": 710
1202
+ },
1203
+ {
1204
+ "epoch": 1.5074587804239727,
1205
+ "grad_norm": 41.16638143070179,
1206
+ "learning_rate": 5.820121557655108e-07,
1207
+ "logits/chosen": -0.1299329400062561,
1208
+ "logits/rejected": -0.10874001681804657,
1209
+ "logps/chosen": -303.56732177734375,
1210
+ "logps/rejected": -280.8491516113281,
1211
+ "loss": 0.5135,
1212
+ "rewards/accuracies": 0.7437499761581421,
1213
+ "rewards/chosen": 0.11438952386379242,
1214
+ "rewards/margins": 0.7343131899833679,
1215
+ "rewards/rejected": -0.6199236512184143,
1216
+ "step": 720
1217
+ },
1218
+ {
1219
+ "epoch": 1.5283957079298613,
1220
+ "grad_norm": 41.49005357475057,
1221
+ "learning_rate": 5.699491214031657e-07,
1222
+ "logits/chosen": -0.14048215746879578,
1223
+ "logits/rejected": -0.18900522589683533,
1224
+ "logps/chosen": -280.69598388671875,
1225
+ "logps/rejected": -255.81298828125,
1226
+ "loss": 0.5166,
1227
+ "rewards/accuracies": 0.8125,
1228
+ "rewards/chosen": 0.11929772049188614,
1229
+ "rewards/margins": 0.7799339294433594,
1230
+ "rewards/rejected": -0.6606361269950867,
1231
+ "step": 730
1232
+ },
1233
+ {
1234
+ "epoch": 1.54933263543575,
1235
+ "grad_norm": 34.97801212344648,
1236
+ "learning_rate": 5.578444093786008e-07,
1237
+ "logits/chosen": 0.06411169469356537,
1238
+ "logits/rejected": 0.15786513686180115,
1239
+ "logps/chosen": -315.06182861328125,
1240
+ "logps/rejected": -282.93115234375,
1241
+ "loss": 0.5128,
1242
+ "rewards/accuracies": 0.7437499761581421,
1243
+ "rewards/chosen": 0.24365051090717316,
1244
+ "rewards/margins": 0.8540836572647095,
1245
+ "rewards/rejected": -0.6104331612586975,
1246
+ "step": 740
1247
+ },
1248
+ {
1249
+ "epoch": 1.5702695629416383,
1250
+ "grad_norm": 32.53803724377542,
1251
+ "learning_rate": 5.457052320211339e-07,
1252
+ "logits/chosen": -0.22511212527751923,
1253
+ "logits/rejected": -0.22842809557914734,
1254
+ "logps/chosen": -298.0352783203125,
1255
+ "logps/rejected": -268.8421936035156,
1256
+ "loss": 0.4961,
1257
+ "rewards/accuracies": 0.6937500238418579,
1258
+ "rewards/chosen": 0.09702634066343307,
1259
+ "rewards/margins": 0.632623016834259,
1260
+ "rewards/rejected": -0.5355967283248901,
1261
+ "step": 750
1262
+ },
1263
+ {
1264
+ "epoch": 1.5912064904475267,
1265
+ "grad_norm": 36.857273577646474,
1266
+ "learning_rate": 5.335388221955012e-07,
1267
+ "logits/chosen": -0.044903479516506195,
1268
+ "logits/rejected": 0.0046501667238771915,
1269
+ "logps/chosen": -358.9960632324219,
1270
+ "logps/rejected": -336.44354248046875,
1271
+ "loss": 0.4997,
1272
+ "rewards/accuracies": 0.6812499761581421,
1273
+ "rewards/chosen": 0.28686124086380005,
1274
+ "rewards/margins": 0.7968252301216125,
1275
+ "rewards/rejected": -0.5099639296531677,
1276
+ "step": 760
1277
+ },
1278
+ {
1279
+ "epoch": 1.6121434179534153,
1280
+ "grad_norm": 31.737574949447513,
1281
+ "learning_rate": 5.213524289923126e-07,
1282
+ "logits/chosen": -0.03366886079311371,
1283
+ "logits/rejected": -0.18763691186904907,
1284
+ "logps/chosen": -327.0910339355469,
1285
+ "logps/rejected": -272.4969787597656,
1286
+ "loss": 0.5114,
1287
+ "rewards/accuracies": 0.7562500238418579,
1288
+ "rewards/chosen": 0.2613201141357422,
1289
+ "rewards/margins": 0.8974519968032837,
1290
+ "rewards/rejected": -0.6361318826675415,
1291
+ "step": 770
1292
+ },
1293
+ {
1294
+ "epoch": 1.633080345459304,
1295
+ "grad_norm": 34.36622996446216,
1296
+ "learning_rate": 5.091533134088387e-07,
1297
+ "logits/chosen": -0.12894697487354279,
1298
+ "logits/rejected": -0.142494797706604,
1299
+ "logps/chosen": -320.5164794921875,
1300
+ "logps/rejected": -280.9849548339844,
1301
+ "loss": 0.5146,
1302
+ "rewards/accuracies": 0.7749999761581421,
1303
+ "rewards/chosen": 0.27436262369155884,
1304
+ "rewards/margins": 0.8554395437240601,
1305
+ "rewards/rejected": -0.581076979637146,
1306
+ "step": 780
1307
+ },
1308
+ {
1309
+ "epoch": 1.6540172729651923,
1310
+ "grad_norm": 38.709599843434866,
1311
+ "learning_rate": 4.969487440227038e-07,
1312
+ "logits/chosen": -0.21234026551246643,
1313
+ "logits/rejected": -0.22163410484790802,
1314
+ "logps/chosen": -300.9823303222656,
1315
+ "logps/rejected": -276.39508056640625,
1316
+ "loss": 0.4852,
1317
+ "rewards/accuracies": 0.78125,
1318
+ "rewards/chosen": 0.24367956817150116,
1319
+ "rewards/margins": 0.9389246106147766,
1320
+ "rewards/rejected": -0.6952449679374695,
1321
+ "step": 790
1322
+ },
1323
+ {
1324
+ "epoch": 1.674954200471081,
1325
+ "grad_norm": 31.812270366410278,
1326
+ "learning_rate": 4.847459926610619e-07,
1327
+ "logits/chosen": -0.018023919314146042,
1328
+ "logits/rejected": -0.12209578603506088,
1329
+ "logps/chosen": -350.00439453125,
1330
+ "logps/rejected": -305.96954345703125,
1331
+ "loss": 0.4936,
1332
+ "rewards/accuracies": 0.768750011920929,
1333
+ "rewards/chosen": 0.1853758990764618,
1334
+ "rewards/margins": 0.8932290077209473,
1335
+ "rewards/rejected": -0.7078530788421631,
1336
+ "step": 800
1337
+ },
1338
+ {
1339
+ "epoch": 1.674954200471081,
1340
+ "eval_logits/chosen": -0.3383854031562805,
1341
+ "eval_logits/rejected": -0.36581355333328247,
1342
+ "eval_logps/chosen": -310.3436584472656,
1343
+ "eval_logps/rejected": -275.40155029296875,
1344
+ "eval_loss": 0.5994271636009216,
1345
+ "eval_rewards/accuracies": 0.6746031641960144,
1346
+ "eval_rewards/chosen": -0.029591551050543785,
1347
+ "eval_rewards/margins": 0.5294089317321777,
1348
+ "eval_rewards/rejected": -0.559000551700592,
1349
+ "eval_runtime": 19.1971,
1350
+ "eval_samples_per_second": 104.182,
1351
+ "eval_steps_per_second": 3.282,
1352
+ "step": 800
1353
+ },
1354
+ {
1355
+ "epoch": 1.6958911279769695,
1356
+ "grad_norm": 35.049859279627235,
1357
+ "learning_rate": 4.7255233006783624e-07,
1358
+ "logits/chosen": -0.18587855994701385,
1359
+ "logits/rejected": -0.16283151507377625,
1360
+ "logps/chosen": -307.56439208984375,
1361
+ "logps/rejected": -278.2900390625,
1362
+ "loss": 0.5055,
1363
+ "rewards/accuracies": 0.768750011920929,
1364
+ "rewards/chosen": 0.10620725154876709,
1365
+ "rewards/margins": 0.892121434211731,
1366
+ "rewards/rejected": -0.7859140634536743,
1367
+ "step": 810
1368
+ },
1369
+ {
1370
+ "epoch": 1.7168280554828579,
1371
+ "grad_norm": 34.134111938828184,
1372
+ "learning_rate": 4.6037502157160567e-07,
1373
+ "logits/chosen": -0.2755785584449768,
1374
+ "logits/rejected": -0.31054019927978516,
1375
+ "logps/chosen": -320.56695556640625,
1376
+ "logps/rejected": -273.4154968261719,
1377
+ "loss": 0.5048,
1378
+ "rewards/accuracies": 0.7437499761581421,
1379
+ "rewards/chosen": 0.20151178538799286,
1380
+ "rewards/margins": 0.9227310419082642,
1381
+ "rewards/rejected": -0.7212191820144653,
1382
+ "step": 820
1383
+ },
1384
+ {
1385
+ "epoch": 1.7377649829887463,
1386
+ "grad_norm": 38.68230830225304,
1387
+ "learning_rate": 4.482213227567161e-07,
1388
+ "logits/chosen": -0.2091369926929474,
1389
+ "logits/rejected": -0.14194951951503754,
1390
+ "logps/chosen": -277.35693359375,
1391
+ "logps/rejected": -281.7249450683594,
1392
+ "loss": 0.484,
1393
+ "rewards/accuracies": 0.7562500238418579,
1394
+ "rewards/chosen": 0.10125979036092758,
1395
+ "rewards/margins": 0.8731620907783508,
1396
+ "rewards/rejected": -0.7719023823738098,
1397
+ "step": 830
1398
+ },
1399
+ {
1400
+ "epoch": 1.7587019104946349,
1401
+ "grad_norm": 33.945256383172804,
1402
+ "learning_rate": 4.3609847514019763e-07,
1403
+ "logits/chosen": -0.30677181482315063,
1404
+ "logits/rejected": -0.1776028573513031,
1405
+ "logps/chosen": -261.38873291015625,
1406
+ "logps/rejected": -267.8268127441406,
1407
+ "loss": 0.5094,
1408
+ "rewards/accuracies": 0.7437499761581421,
1409
+ "rewards/chosen": 0.18778590857982635,
1410
+ "rewards/margins": 0.8258863687515259,
1411
+ "rewards/rejected": -0.6381004452705383,
1412
+ "step": 840
1413
+ },
1414
+ {
1415
+ "epoch": 1.7796388380005235,
1416
+ "grad_norm": 34.20224125304341,
1417
+ "learning_rate": 4.240137018570661e-07,
1418
+ "logits/chosen": -0.11399509757757187,
1419
+ "logits/rejected": -0.1185418963432312,
1420
+ "logps/chosen": -275.3580017089844,
1421
+ "logps/rejected": -282.1163024902344,
1422
+ "loss": 0.5074,
1423
+ "rewards/accuracies": 0.737500011920929,
1424
+ "rewards/chosen": -0.03538894280791283,
1425
+ "rewards/margins": 0.7777508497238159,
1426
+ "rewards/rejected": -0.813139796257019,
1427
+ "step": 850
1428
+ },
1429
+ {
1430
+ "epoch": 1.8005757655064119,
1431
+ "grad_norm": 31.946083788257877,
1432
+ "learning_rate": 4.1197420335657366e-07,
1433
+ "logits/chosen": -0.09024197608232498,
1434
+ "logits/rejected": -0.20059815049171448,
1435
+ "logps/chosen": -298.12762451171875,
1436
+ "logps/rejected": -242.2689666748047,
1437
+ "loss": 0.5089,
1438
+ "rewards/accuracies": 0.75,
1439
+ "rewards/chosen": 0.15691380202770233,
1440
+ "rewards/margins": 0.8090406656265259,
1441
+ "rewards/rejected": -0.6521269083023071,
1442
+ "step": 860
1443
+ },
1444
+ {
1445
+ "epoch": 1.8215126930123005,
1446
+ "grad_norm": 28.583891688236477,
1447
+ "learning_rate": 3.9998715311197783e-07,
1448
+ "logits/chosen": -0.1159423366189003,
1449
+ "logits/rejected": -0.16631600260734558,
1450
+ "logps/chosen": -337.1079406738281,
1451
+ "logps/rejected": -285.5828857421875,
1452
+ "loss": 0.4894,
1453
+ "rewards/accuracies": 0.7124999761581421,
1454
+ "rewards/chosen": 0.11767788976430893,
1455
+ "rewards/margins": 0.6672422289848328,
1456
+ "rewards/rejected": -0.5495643615722656,
1457
+ "step": 870
1458
+ },
1459
+ {
1460
+ "epoch": 1.842449620518189,
1461
+ "grad_norm": 33.76166639382152,
1462
+ "learning_rate": 3.880596933463843e-07,
1463
+ "logits/chosen": -0.07985590398311615,
1464
+ "logits/rejected": -0.03050703927874565,
1465
+ "logps/chosen": -274.9603576660156,
1466
+ "logps/rejected": -266.6165771484375,
1467
+ "loss": 0.5075,
1468
+ "rewards/accuracies": 0.731249988079071,
1469
+ "rewards/chosen": 0.09320048242807388,
1470
+ "rewards/margins": 0.7023938298225403,
1471
+ "rewards/rejected": -0.6091933250427246,
1472
+ "step": 880
1473
+ },
1474
+ {
1475
+ "epoch": 1.8633865480240774,
1476
+ "grad_norm": 40.29693979290027,
1477
+ "learning_rate": 3.761989307772085e-07,
1478
+ "logits/chosen": -0.15825888514518738,
1479
+ "logits/rejected": -0.1759234219789505,
1480
+ "logps/chosen": -306.653076171875,
1481
+ "logps/rejected": -279.01177978515625,
1482
+ "loss": 0.5055,
1483
+ "rewards/accuracies": 0.8187500238418579,
1484
+ "rewards/chosen": 0.16093602776527405,
1485
+ "rewards/margins": 0.937556266784668,
1486
+ "rewards/rejected": -0.7766203284263611,
1487
+ "step": 890
1488
+ },
1489
+ {
1490
+ "epoch": 1.8843234755299658,
1491
+ "grad_norm": 36.002728955160904,
1492
+ "learning_rate": 3.6441193238179146e-07,
1493
+ "logits/chosen": -0.05288013815879822,
1494
+ "logits/rejected": -0.13031360507011414,
1495
+ "logps/chosen": -334.34417724609375,
1496
+ "logps/rejected": -269.53558349609375,
1497
+ "loss": 0.4904,
1498
+ "rewards/accuracies": 0.731249988079071,
1499
+ "rewards/chosen": 0.14437799155712128,
1500
+ "rewards/margins": 0.8378474116325378,
1501
+ "rewards/rejected": -0.6934694051742554,
1502
+ "step": 900
1503
+ },
1504
+ {
1505
+ "epoch": 1.8843234755299658,
1506
+ "eval_logits/chosen": -0.3443281650543213,
1507
+ "eval_logits/rejected": -0.3704880475997925,
1508
+ "eval_logps/chosen": -310.4006042480469,
1509
+ "eval_logps/rejected": -275.5133972167969,
1510
+ "eval_loss": 0.5989052653312683,
1511
+ "eval_rewards/accuracies": 0.6944444179534912,
1512
+ "eval_rewards/chosen": -0.05807310715317726,
1513
+ "eval_rewards/margins": 0.55684494972229,
1514
+ "eval_rewards/rejected": -0.614918053150177,
1515
+ "eval_runtime": 18.9036,
1516
+ "eval_samples_per_second": 105.8,
1517
+ "eval_steps_per_second": 3.333,
1518
+ "step": 900
1519
+ },
1520
+ {
1521
+ "epoch": 1.9052604030358546,
1522
+ "grad_norm": 31.165480893885753,
1523
+ "learning_rate": 3.5270572118669715e-07,
1524
+ "logits/chosen": -0.11026148498058319,
1525
+ "logits/rejected": -0.16978046298027039,
1526
+ "logps/chosen": -278.1309509277344,
1527
+ "logps/rejected": -235.88150024414062,
1528
+ "loss": 0.5272,
1529
+ "rewards/accuracies": 0.762499988079071,
1530
+ "rewards/chosen": 0.02903774380683899,
1531
+ "rewards/margins": 0.8374470472335815,
1532
+ "rewards/rejected": -0.8084093332290649,
1533
+ "step": 910
1534
+ },
1535
+ {
1536
+ "epoch": 1.926197330541743,
1537
+ "grad_norm": 40.996379136014625,
1538
+ "learning_rate": 3.4108727208319314e-07,
1539
+ "logits/chosen": -0.17259056866168976,
1540
+ "logits/rejected": -0.20306821167469025,
1541
+ "logps/chosen": -312.1188049316406,
1542
+ "logps/rejected": -266.6794128417969,
1543
+ "loss": 0.5039,
1544
+ "rewards/accuracies": 0.737500011920929,
1545
+ "rewards/chosen": 0.16847112774848938,
1546
+ "rewards/margins": 0.8200035095214844,
1547
+ "rewards/rejected": -0.6515323519706726,
1548
+ "step": 920
1549
+ },
1550
+ {
1551
+ "epoch": 1.9471342580476314,
1552
+ "grad_norm": 31.825399091982387,
1553
+ "learning_rate": 3.295635076714144e-07,
1554
+ "logits/chosen": -0.1496247947216034,
1555
+ "logits/rejected": -0.17214402556419373,
1556
+ "logps/chosen": -301.103515625,
1557
+ "logps/rejected": -270.370361328125,
1558
+ "loss": 0.4903,
1559
+ "rewards/accuracies": 0.78125,
1560
+ "rewards/chosen": 0.16043229401111603,
1561
+ "rewards/margins": 0.7891928553581238,
1562
+ "rewards/rejected": -0.628760576248169,
1563
+ "step": 930
1564
+ },
1565
+ {
1566
+ "epoch": 1.96807118555352,
1567
+ "grad_norm": 39.834235726892466,
1568
+ "learning_rate": 3.181412941356816e-07,
1569
+ "logits/chosen": -0.13259033858776093,
1570
+ "logits/rejected": -0.11601094901561737,
1571
+ "logps/chosen": -266.5997314453125,
1572
+ "logps/rejected": -231.60848999023438,
1573
+ "loss": 0.485,
1574
+ "rewards/accuracies": 0.78125,
1575
+ "rewards/chosen": 0.19742903113365173,
1576
+ "rewards/margins": 0.8145695924758911,
1577
+ "rewards/rejected": -0.6171405911445618,
1578
+ "step": 940
1579
+ },
1580
+ {
1581
+ "epoch": 1.9890081130594086,
1582
+ "grad_norm": 35.65864774290716,
1583
+ "learning_rate": 3.068274371534356e-07,
1584
+ "logits/chosen": -0.11888673156499863,
1585
+ "logits/rejected": -0.2420310229063034,
1586
+ "logps/chosen": -301.62213134765625,
1587
+ "logps/rejected": -250.2771453857422,
1588
+ "loss": 0.4887,
1589
+ "rewards/accuracies": 0.768750011920929,
1590
+ "rewards/chosen": 0.002846676157787442,
1591
+ "rewards/margins": 0.7688849568367004,
1592
+ "rewards/rejected": -0.7660382986068726,
1593
+ "step": 950
1594
+ },
1595
+ {
1596
+ "epoch": 2.009945040565297,
1597
+ "grad_norm": 28.814451971980255,
1598
+ "learning_rate": 2.956286778402226e-07,
1599
+ "logits/chosen": -0.10088515281677246,
1600
+ "logits/rejected": -0.12641310691833496,
1601
+ "logps/chosen": -301.3664855957031,
1602
+ "logps/rejected": -283.16461181640625,
1603
+ "loss": 0.4567,
1604
+ "rewards/accuracies": 0.762499988079071,
1605
+ "rewards/chosen": 0.20988380908966064,
1606
+ "rewards/margins": 0.9322754740715027,
1607
+ "rewards/rejected": -0.7223917245864868,
1608
+ "step": 960
1609
+ },
1610
+ {
1611
+ "epoch": 2.0308819680711854,
1612
+ "grad_norm": 28.79413061525845,
1613
+ "learning_rate": 2.84551688733146e-07,
1614
+ "logits/chosen": -0.07364392280578613,
1615
+ "logits/rejected": -0.1452295333147049,
1616
+ "logps/chosen": -299.0059509277344,
1617
+ "logps/rejected": -238.53231811523438,
1618
+ "loss": 0.4557,
1619
+ "rewards/accuracies": 0.84375,
1620
+ "rewards/chosen": 0.166302889585495,
1621
+ "rewards/margins": 0.9308949708938599,
1622
+ "rewards/rejected": -0.7645919919013977,
1623
+ "step": 970
1624
+ },
1625
+ {
1626
+ "epoch": 2.051818895577074,
1627
+ "grad_norm": 31.23059423273923,
1628
+ "learning_rate": 2.7360306981518147e-07,
1629
+ "logits/chosen": -0.09392131119966507,
1630
+ "logits/rejected": -0.047506559640169144,
1631
+ "logps/chosen": -262.2680969238281,
1632
+ "logps/rejected": -253.03271484375,
1633
+ "loss": 0.4445,
1634
+ "rewards/accuracies": 0.8125,
1635
+ "rewards/chosen": 0.040274180471897125,
1636
+ "rewards/margins": 0.7928990125656128,
1637
+ "rewards/rejected": -0.7526248097419739,
1638
+ "step": 980
1639
+ },
1640
+ {
1641
+ "epoch": 2.0727558230829626,
1642
+ "grad_norm": 27.855961833782285,
1643
+ "learning_rate": 2.6278934458271996e-07,
1644
+ "logits/chosen": -0.1288485825061798,
1645
+ "logits/rejected": -0.1765173375606537,
1646
+ "logps/chosen": -314.7231750488281,
1647
+ "logps/rejected": -302.44964599609375,
1648
+ "loss": 0.4327,
1649
+ "rewards/accuracies": 0.800000011920929,
1650
+ "rewards/chosen": 0.18189235031604767,
1651
+ "rewards/margins": 1.0036437511444092,
1652
+ "rewards/rejected": -0.8217514157295227,
1653
+ "step": 990
1654
+ },
1655
+ {
1656
+ "epoch": 2.093692750588851,
1657
+ "grad_norm": 35.285113595842645,
1658
+ "learning_rate": 2.5211695615868456e-07,
1659
+ "logits/chosen": -0.1857554316520691,
1660
+ "logits/rejected": -0.3039974272251129,
1661
+ "logps/chosen": -281.1855163574219,
1662
+ "logps/rejected": -240.62710571289062,
1663
+ "loss": 0.4622,
1664
+ "rewards/accuracies": 0.793749988079071,
1665
+ "rewards/chosen": 0.15526309609413147,
1666
+ "rewards/margins": 0.9393559694290161,
1667
+ "rewards/rejected": -0.7840928435325623,
1668
+ "step": 1000
1669
+ },
1670
+ {
1671
+ "epoch": 2.093692750588851,
1672
+ "eval_logits/chosen": -0.3450426459312439,
1673
+ "eval_logits/rejected": -0.37242141366004944,
1674
+ "eval_logps/chosen": -310.4169006347656,
1675
+ "eval_logps/rejected": -275.49713134765625,
1676
+ "eval_loss": 0.59389328956604,
1677
+ "eval_rewards/accuracies": 0.6944444179534912,
1678
+ "eval_rewards/chosen": -0.06623569875955582,
1679
+ "eval_rewards/margins": 0.5405489802360535,
1680
+ "eval_rewards/rejected": -0.6067846417427063,
1681
+ "eval_runtime": 19.108,
1682
+ "eval_samples_per_second": 104.668,
1683
+ "eval_steps_per_second": 3.297,
1684
+ "step": 1000
1685
+ },
1686
+ {
1687
+ "epoch": 2.11462967809474,
1688
+ "grad_norm": 32.59720699053956,
1689
+ "learning_rate": 2.4159226345353647e-07,
1690
+ "logits/chosen": -0.30413442850112915,
1691
+ "logits/rejected": -0.24426865577697754,
1692
+ "logps/chosen": -270.49920654296875,
1693
+ "logps/rejected": -255.44723510742188,
1694
+ "loss": 0.4508,
1695
+ "rewards/accuracies": 0.7562500238418579,
1696
+ "rewards/chosen": 0.03533598408102989,
1697
+ "rewards/margins": 0.7992294430732727,
1698
+ "rewards/rejected": -0.7638934254646301,
1699
+ "step": 1010
1700
+ },
1701
+ {
1702
+ "epoch": 2.135566605600628,
1703
+ "grad_norm": 30.8340208692361,
1704
+ "learning_rate": 2.312215373764551e-07,
1705
+ "logits/chosen": -0.07855603843927383,
1706
+ "logits/rejected": -0.094606414437294,
1707
+ "logps/chosen": -289.9374084472656,
1708
+ "logps/rejected": -277.69586181640625,
1709
+ "loss": 0.4353,
1710
+ "rewards/accuracies": 0.8125,
1711
+ "rewards/chosen": 0.19073505699634552,
1712
+ "rewards/margins": 0.8868581652641296,
1713
+ "rewards/rejected": -0.6961231231689453,
1714
+ "step": 1020
1715
+ },
1716
+ {
1717
+ "epoch": 2.1565035331065165,
1718
+ "grad_norm": 29.34387598294554,
1719
+ "learning_rate": 2.2101095709895512e-07,
1720
+ "logits/chosen": 0.009572875685989857,
1721
+ "logits/rejected": -0.09117701649665833,
1722
+ "logps/chosen": -282.7410583496094,
1723
+ "logps/rejected": -239.8011932373047,
1724
+ "loss": 0.438,
1725
+ "rewards/accuracies": 0.793749988079071,
1726
+ "rewards/chosen": 0.12627363204956055,
1727
+ "rewards/margins": 0.8894168138504028,
1728
+ "rewards/rejected": -0.7631431221961975,
1729
+ "step": 1030
1730
+ },
1731
+ {
1732
+ "epoch": 2.177440460612405,
1733
+ "grad_norm": 29.466746389871066,
1734
+ "learning_rate": 2.1096660637315932e-07,
1735
+ "logits/chosen": -0.0853545218706131,
1736
+ "logits/rejected": -0.1613493412733078,
1737
+ "logps/chosen": -309.2115783691406,
1738
+ "logps/rejected": -279.7373352050781,
1739
+ "loss": 0.4374,
1740
+ "rewards/accuracies": 0.856249988079071,
1741
+ "rewards/chosen": 0.2535983920097351,
1742
+ "rewards/margins": 1.087397575378418,
1743
+ "rewards/rejected": -0.8337991833686829,
1744
+ "step": 1040
1745
+ },
1746
+ {
1747
+ "epoch": 2.1983773881182938,
1748
+ "grad_norm": 31.3324256474218,
1749
+ "learning_rate": 2.0109446990692963e-07,
1750
+ "logits/chosen": -0.17987683415412903,
1751
+ "logits/rejected": -0.24151673913002014,
1752
+ "logps/chosen": -323.58477783203125,
1753
+ "logps/rejected": -268.08111572265625,
1754
+ "loss": 0.4365,
1755
+ "rewards/accuracies": 0.75,
1756
+ "rewards/chosen": 0.09657944738864899,
1757
+ "rewards/margins": 0.8588522672653198,
1758
+ "rewards/rejected": -0.7622728943824768,
1759
+ "step": 1050
1760
+ },
1761
+ {
1762
+ "epoch": 2.219314315624182,
1763
+ "grad_norm": 33.25666630785023,
1764
+ "learning_rate": 1.9140042979800737e-07,
1765
+ "logits/chosen": -0.13123273849487305,
1766
+ "logits/rejected": -0.12296972423791885,
1767
+ "logps/chosen": -260.5819396972656,
1768
+ "logps/rejected": -232.4111328125,
1769
+ "loss": 0.4355,
1770
+ "rewards/accuracies": 0.7437499761581421,
1771
+ "rewards/chosen": 0.05866159126162529,
1772
+ "rewards/margins": 0.7927514314651489,
1773
+ "rewards/rejected": -0.7340899109840393,
1774
+ "step": 1060
1775
+ },
1776
+ {
1777
+ "epoch": 2.2402512431300705,
1778
+ "grad_norm": 27.925843344985296,
1779
+ "learning_rate": 1.8189026202929391e-07,
1780
+ "logits/chosen": -0.1511206328868866,
1781
+ "logits/rejected": -0.17038318514823914,
1782
+ "logps/chosen": -290.31951904296875,
1783
+ "logps/rejected": -260.115966796875,
1784
+ "loss": 0.4534,
1785
+ "rewards/accuracies": 0.793749988079071,
1786
+ "rewards/chosen": 0.11966502666473389,
1787
+ "rewards/margins": 0.9545691609382629,
1788
+ "rewards/rejected": -0.8349041938781738,
1789
+ "step": 1070
1790
+ },
1791
+ {
1792
+ "epoch": 2.2611881706359593,
1793
+ "grad_norm": 26.475722050618995,
1794
+ "learning_rate": 1.725696330273575e-07,
1795
+ "logits/chosen": -0.08606094121932983,
1796
+ "logits/rejected": -0.1871510148048401,
1797
+ "logps/chosen": -288.9709777832031,
1798
+ "logps/rejected": -249.9591064453125,
1799
+ "loss": 0.4441,
1800
+ "rewards/accuracies": 0.793749988079071,
1801
+ "rewards/chosen": 0.19028839468955994,
1802
+ "rewards/margins": 1.0648086071014404,
1803
+ "rewards/rejected": -0.8745201826095581,
1804
+ "step": 1080
1805
+ },
1806
+ {
1807
+ "epoch": 2.2821250981418477,
1808
+ "grad_norm": 28.63389247775027,
1809
+ "learning_rate": 1.634440962862148e-07,
1810
+ "logits/chosen": -0.05144480615854263,
1811
+ "logits/rejected": -0.04060221463441849,
1812
+ "logps/chosen": -295.44122314453125,
1813
+ "logps/rejected": -262.346435546875,
1814
+ "loss": 0.4318,
1815
+ "rewards/accuracies": 0.8062499761581421,
1816
+ "rewards/chosen": 0.20048430562019348,
1817
+ "rewards/margins": 0.9130541086196899,
1818
+ "rewards/rejected": -0.7125697135925293,
1819
+ "step": 1090
1820
+ },
1821
+ {
1822
+ "epoch": 2.303062025647736,
1823
+ "grad_norm": 28.577071054830697,
1824
+ "learning_rate": 1.545190890584042e-07,
1825
+ "logits/chosen": -0.23821565508842468,
1826
+ "logits/rejected": -0.23792441189289093,
1827
+ "logps/chosen": -287.6651306152344,
1828
+ "logps/rejected": -261.96875,
1829
+ "loss": 0.4458,
1830
+ "rewards/accuracies": 0.8125,
1831
+ "rewards/chosen": 0.24947874248027802,
1832
+ "rewards/margins": 0.945774257183075,
1833
+ "rewards/rejected": -0.696295440196991,
1834
+ "step": 1100
1835
+ },
1836
+ {
1837
+ "epoch": 2.303062025647736,
1838
+ "eval_logits/chosen": -0.3449550271034241,
1839
+ "eval_logits/rejected": -0.3728056252002716,
1840
+ "eval_logps/chosen": -310.3917541503906,
1841
+ "eval_logps/rejected": -275.5622253417969,
1842
+ "eval_loss": 0.5923458933830261,
1843
+ "eval_rewards/accuracies": 0.6944444179534912,
1844
+ "eval_rewards/chosen": -0.05364745110273361,
1845
+ "eval_rewards/margins": 0.5857000946998596,
1846
+ "eval_rewards/rejected": -0.639347493648529,
1847
+ "eval_runtime": 19.0051,
1848
+ "eval_samples_per_second": 105.235,
1849
+ "eval_steps_per_second": 3.315,
1850
+ "step": 1100
1851
+ },
1852
+ {
1853
+ "epoch": 2.323998953153625,
1854
+ "grad_norm": 30.512738174296523,
1855
+ "learning_rate": 1.4579992911531496e-07,
1856
+ "logits/chosen": -0.2525126338005066,
1857
+ "logits/rejected": -0.3065794110298157,
1858
+ "logps/chosen": -325.3319091796875,
1859
+ "logps/rejected": -288.85089111328125,
1860
+ "loss": 0.4406,
1861
+ "rewards/accuracies": 0.8187500238418579,
1862
+ "rewards/chosen": 0.28674450516700745,
1863
+ "rewards/margins": 1.1190677881240845,
1864
+ "rewards/rejected": -0.8323231935501099,
1865
+ "step": 1110
1866
+ },
1867
+ {
1868
+ "epoch": 2.3449358806595133,
1869
+ "grad_norm": 28.019655950524385,
1870
+ "learning_rate": 1.372918115787112e-07,
1871
+ "logits/chosen": -0.19494856894016266,
1872
+ "logits/rejected": -0.16709737479686737,
1873
+ "logps/chosen": -269.15802001953125,
1874
+ "logps/rejected": -281.6461486816406,
1875
+ "loss": 0.4203,
1876
+ "rewards/accuracies": 0.862500011920929,
1877
+ "rewards/chosen": 0.2475571632385254,
1878
+ "rewards/margins": 1.078798770904541,
1879
+ "rewards/rejected": -0.8312416076660156,
1880
+ "step": 1120
1881
+ },
1882
+ {
1883
+ "epoch": 2.3658728081654017,
1884
+ "grad_norm": 30.412163422048742,
1885
+ "learning_rate": 1.289998058253297e-07,
1886
+ "logits/chosen": -0.30280619859695435,
1887
+ "logits/rejected": -0.23964910209178925,
1888
+ "logps/chosen": -300.67291259765625,
1889
+ "logps/rejected": -286.5229187011719,
1890
+ "loss": 0.4429,
1891
+ "rewards/accuracies": 0.8374999761581421,
1892
+ "rewards/chosen": 0.10387909412384033,
1893
+ "rewards/margins": 0.9766238331794739,
1894
+ "rewards/rejected": -0.8727447390556335,
1895
+ "step": 1130
1896
+ },
1897
+ {
1898
+ "epoch": 2.38680973567129,
1899
+ "grad_norm": 26.079069616215104,
1900
+ "learning_rate": 1.209288524664029e-07,
1901
+ "logits/chosen": -0.09895231574773788,
1902
+ "logits/rejected": -0.04007222130894661,
1903
+ "logps/chosen": -295.33795166015625,
1904
+ "logps/rejected": -295.7159118652344,
1905
+ "loss": 0.4524,
1906
+ "rewards/accuracies": 0.8125,
1907
+ "rewards/chosen": 0.3174125850200653,
1908
+ "rewards/margins": 1.068551778793335,
1909
+ "rewards/rejected": -0.7511391639709473,
1910
+ "step": 1140
1911
+ },
1912
+ {
1913
+ "epoch": 2.407746663177179,
1914
+ "grad_norm": 35.04317303338736,
1915
+ "learning_rate": 1.1308376040390344e-07,
1916
+ "logits/chosen": -0.2176963984966278,
1917
+ "logits/rejected": -0.2429191619157791,
1918
+ "logps/chosen": -268.51568603515625,
1919
+ "logps/rejected": -243.83602905273438,
1920
+ "loss": 0.4648,
1921
+ "rewards/accuracies": 0.768750011920929,
1922
+ "rewards/chosen": 0.1225685253739357,
1923
+ "rewards/margins": 0.8287593722343445,
1924
+ "rewards/rejected": -0.7061909437179565,
1925
+ "step": 1150
1926
+ },
1927
+ {
1928
+ "epoch": 2.4286835906830673,
1929
+ "grad_norm": 27.071258876733747,
1930
+ "learning_rate": 1.0546920396526382e-07,
1931
+ "logits/chosen": -0.05287874490022659,
1932
+ "logits/rejected": -0.08571253716945648,
1933
+ "logps/chosen": -345.3739013671875,
1934
+ "logps/rejected": -296.2100524902344,
1935
+ "loss": 0.4248,
1936
+ "rewards/accuracies": 0.824999988079071,
1937
+ "rewards/chosen": 0.31707775592803955,
1938
+ "rewards/margins": 1.1236298084259033,
1939
+ "rewards/rejected": -0.8065520524978638,
1940
+ "step": 1160
1941
+ },
1942
+ {
1943
+ "epoch": 2.4496205181889557,
1944
+ "grad_norm": 30.813022693471254,
1945
+ "learning_rate": 9.808972011828054e-08,
1946
+ "logits/chosen": -0.07085756957530975,
1947
+ "logits/rejected": -0.1990344077348709,
1948
+ "logps/chosen": -342.7474060058594,
1949
+ "logps/rejected": -266.599609375,
1950
+ "loss": 0.4366,
1951
+ "rewards/accuracies": 0.831250011920929,
1952
+ "rewards/chosen": 0.2785649299621582,
1953
+ "rewards/margins": 1.1454359292984009,
1954
+ "rewards/rejected": -0.8668708801269531,
1955
+ "step": 1170
1956
+ },
1957
+ {
1958
+ "epoch": 2.470557445694844,
1959
+ "grad_norm": 28.23769321193845,
1960
+ "learning_rate": 9.094970576786032e-08,
1961
+ "logits/chosen": 0.02408856526017189,
1962
+ "logits/rejected": -0.0395381897687912,
1963
+ "logps/chosen": -267.3003234863281,
1964
+ "logps/rejected": -249.65158081054688,
1965
+ "loss": 0.4323,
1966
+ "rewards/accuracies": 0.824999988079071,
1967
+ "rewards/chosen": 0.17695440351963043,
1968
+ "rewards/margins": 1.0320888757705688,
1969
+ "rewards/rejected": -0.8551343679428101,
1970
+ "step": 1180
1971
+ },
1972
+ {
1973
+ "epoch": 2.491494373200733,
1974
+ "grad_norm": 30.361270294186127,
1975
+ "learning_rate": 8.405341513622055e-08,
1976
+ "logits/chosen": -0.16236011683940887,
1977
+ "logits/rejected": -0.20971974730491638,
1978
+ "logps/chosen": -324.1960754394531,
1979
+ "logps/rejected": -260.82647705078125,
1980
+ "loss": 0.4368,
1981
+ "rewards/accuracies": 0.831250011920929,
1982
+ "rewards/chosen": 0.3019522428512573,
1983
+ "rewards/margins": 0.9704391360282898,
1984
+ "rewards/rejected": -0.6684868931770325,
1985
+ "step": 1190
1986
+ },
1987
+ {
1988
+ "epoch": 2.5124313007066212,
1989
+ "grad_norm": 29.87892753836197,
1990
+ "learning_rate": 7.740495722810269e-08,
1991
+ "logits/chosen": -0.1393619179725647,
1992
+ "logits/rejected": -0.17410680651664734,
1993
+ "logps/chosen": -275.013427734375,
1994
+ "logps/rejected": -218.1080322265625,
1995
+ "loss": 0.4462,
1996
+ "rewards/accuracies": 0.768750011920929,
1997
+ "rewards/chosen": 0.1310921609401703,
1998
+ "rewards/margins": 0.8609299659729004,
1999
+ "rewards/rejected": -0.7298377156257629,
2000
+ "step": 1200
2001
+ },
2002
+ {
2003
+ "epoch": 2.5124313007066212,
2004
+ "eval_logits/chosen": -0.34315019845962524,
2005
+ "eval_logits/rejected": -0.37096092104911804,
2006
+ "eval_logps/chosen": -310.3815612792969,
2007
+ "eval_logps/rejected": -275.54351806640625,
2008
+ "eval_loss": 0.589375913143158,
2009
+ "eval_rewards/accuracies": 0.7023809552192688,
2010
+ "eval_rewards/chosen": -0.04855651780962944,
2011
+ "eval_rewards/margins": 0.5814378261566162,
2012
+ "eval_rewards/rejected": -0.6299943923950195,
2013
+ "eval_runtime": 19.1944,
2014
+ "eval_samples_per_second": 104.197,
2015
+ "eval_steps_per_second": 3.282,
2016
+ "step": 1200
2017
+ },
2018
+ {
2019
+ "epoch": 2.5333682282125096,
2020
+ "grad_norm": 30.713812818544117,
2021
+ "learning_rate": 7.100829338251146e-08,
2022
+ "logits/chosen": -0.10057993978261948,
2023
+ "logits/rejected": -0.16287049651145935,
2024
+ "logps/chosen": -328.2908935546875,
2025
+ "logps/rejected": -289.701904296875,
2026
+ "loss": 0.4416,
2027
+ "rewards/accuracies": 0.8187500238418579,
2028
+ "rewards/chosen": 0.3149394094944,
2029
+ "rewards/margins": 1.1269749402999878,
2030
+ "rewards/rejected": -0.8120356798171997,
2031
+ "step": 1210
2032
+ },
2033
+ {
2034
+ "epoch": 2.5543051557183984,
2035
+ "grad_norm": 32.721460368023834,
2036
+ "learning_rate": 6.486723491243778e-08,
2037
+ "logits/chosen": -0.14739738404750824,
2038
+ "logits/rejected": -0.1756991297006607,
2039
+ "logps/chosen": -287.8624572753906,
2040
+ "logps/rejected": -264.6856994628906,
2041
+ "loss": 0.4523,
2042
+ "rewards/accuracies": 0.862500011920929,
2043
+ "rewards/chosen": 0.20502778887748718,
2044
+ "rewards/margins": 0.9769600033760071,
2045
+ "rewards/rejected": -0.7719322443008423,
2046
+ "step": 1220
2047
+ },
2048
+ {
2049
+ "epoch": 2.575242083224287,
2050
+ "grad_norm": 32.50553549993069,
2051
+ "learning_rate": 5.898544083397e-08,
2052
+ "logits/chosen": -0.12704332172870636,
2053
+ "logits/rejected": -0.1398163139820099,
2054
+ "logps/chosen": -319.8119201660156,
2055
+ "logps/rejected": -287.4822082519531,
2056
+ "loss": 0.4467,
2057
+ "rewards/accuracies": 0.8374999761581421,
2058
+ "rewards/chosen": 0.2838557958602905,
2059
+ "rewards/margins": 1.0057270526885986,
2060
+ "rewards/rejected": -0.7218713164329529,
2061
+ "step": 1230
2062
+ },
2063
+ {
2064
+ "epoch": 2.596179010730175,
2065
+ "grad_norm": 30.599102169827287,
2066
+ "learning_rate": 5.3366415686149137e-08,
2067
+ "logits/chosen": -0.1681247055530548,
2068
+ "logits/rejected": -0.15492555499076843,
2069
+ "logps/chosen": -248.78054809570312,
2070
+ "logps/rejected": -238.68106079101562,
2071
+ "loss": 0.4368,
2072
+ "rewards/accuracies": 0.7875000238418579,
2073
+ "rewards/chosen": 0.10712571442127228,
2074
+ "rewards/margins": 0.8596469759941101,
2075
+ "rewards/rejected": -0.7525211572647095,
2076
+ "step": 1240
2077
+ },
2078
+ {
2079
+ "epoch": 2.617115938236064,
2080
+ "grad_norm": 31.156979603329955,
2081
+ "learning_rate": 4.8013507442865585e-08,
2082
+ "logits/chosen": -0.19022394716739655,
2083
+ "logits/rejected": -0.10595899820327759,
2084
+ "logps/chosen": -271.9454345703125,
2085
+ "logps/rejected": -274.1131286621094,
2086
+ "loss": 0.4319,
2087
+ "rewards/accuracies": 0.7875000238418579,
2088
+ "rewards/chosen": 0.17164357006549835,
2089
+ "rewards/margins": 0.7940610647201538,
2090
+ "rewards/rejected": -0.6224175095558167,
2091
+ "step": 1250
2092
+ },
2093
+ {
2094
+ "epoch": 2.6380528657419524,
2095
+ "grad_norm": 28.881054799580607,
2096
+ "learning_rate": 4.292990551804171e-08,
2097
+ "logits/chosen": -0.08757440745830536,
2098
+ "logits/rejected": -0.08349265158176422,
2099
+ "logps/chosen": -287.8446350097656,
2100
+ "logps/rejected": -251.55636596679688,
2101
+ "loss": 0.4229,
2102
+ "rewards/accuracies": 0.793749988079071,
2103
+ "rewards/chosen": 0.27148956060409546,
2104
+ "rewards/margins": 1.0949037075042725,
2105
+ "rewards/rejected": -0.8234142065048218,
2106
+ "step": 1260
2107
+ },
2108
+ {
2109
+ "epoch": 2.658989793247841,
2110
+ "grad_norm": 27.816379754479705,
2111
+ "learning_rate": 3.811863886528882e-08,
2112
+ "logits/chosen": -0.05129946395754814,
2113
+ "logits/rejected": -0.07520854473114014,
2114
+ "logps/chosen": -320.88092041015625,
2115
+ "logps/rejected": -259.69488525390625,
2116
+ "loss": 0.4403,
2117
+ "rewards/accuracies": 0.824999988079071,
2118
+ "rewards/chosen": 0.1268620789051056,
2119
+ "rewards/margins": 0.9729422330856323,
2120
+ "rewards/rejected": -0.8460801839828491,
2121
+ "step": 1270
2122
+ },
2123
+ {
2124
+ "epoch": 2.6799267207537296,
2125
+ "grad_norm": 31.45496593116063,
2126
+ "learning_rate": 3.358257417317095e-08,
2127
+ "logits/chosen": -0.08449853211641312,
2128
+ "logits/rejected": -0.1029936671257019,
2129
+ "logps/chosen": -288.6929016113281,
2130
+ "logps/rejected": -251.7803192138672,
2131
+ "loss": 0.4427,
2132
+ "rewards/accuracies": 0.737500011920929,
2133
+ "rewards/chosen": 0.18886741995811462,
2134
+ "rewards/margins": 0.9686278104782104,
2135
+ "rewards/rejected": -0.779760479927063,
2136
+ "step": 1280
2137
+ },
2138
+ {
2139
+ "epoch": 2.700863648259618,
2140
+ "grad_norm": 29.618711306226224,
2141
+ "learning_rate": 2.9324414157151367e-08,
2142
+ "logits/chosen": -0.1431390345096588,
2143
+ "logits/rejected": -0.10319207608699799,
2144
+ "logps/chosen": -265.52508544921875,
2145
+ "logps/rejected": -260.0169677734375,
2146
+ "loss": 0.4661,
2147
+ "rewards/accuracies": 0.7875000238418579,
2148
+ "rewards/chosen": 0.10996869951486588,
2149
+ "rewards/margins": 0.7998045682907104,
2150
+ "rewards/rejected": -0.6898358464241028,
2151
+ "step": 1290
2152
+ },
2153
+ {
2154
+ "epoch": 2.7218005757655064,
2155
+ "grad_norm": 30.57983222127276,
2156
+ "learning_rate": 2.5346695949237717e-08,
2157
+ "logits/chosen": -0.1215812936425209,
2158
+ "logits/rejected": -0.1575727015733719,
2159
+ "logps/chosen": -285.28839111328125,
2160
+ "logps/rejected": -258.9991455078125,
2161
+ "loss": 0.4312,
2162
+ "rewards/accuracies": 0.8062499761581421,
2163
+ "rewards/chosen": 0.16850288212299347,
2164
+ "rewards/margins": 0.923494815826416,
2165
+ "rewards/rejected": -0.7549919486045837,
2166
+ "step": 1300
2167
+ },
2168
+ {
2169
+ "epoch": 2.7218005757655064,
2170
+ "eval_logits/chosen": -0.3442156910896301,
2171
+ "eval_logits/rejected": -0.37236616015434265,
2172
+ "eval_logps/chosen": -310.4347229003906,
2173
+ "eval_logps/rejected": -275.5621337890625,
2174
+ "eval_loss": 0.5860841870307922,
2175
+ "eval_rewards/accuracies": 0.6666666865348816,
2176
+ "eval_rewards/chosen": -0.07512283325195312,
2177
+ "eval_rewards/margins": 0.5641648769378662,
2178
+ "eval_rewards/rejected": -0.6392877101898193,
2179
+ "eval_runtime": 18.9961,
2180
+ "eval_samples_per_second": 105.285,
2181
+ "eval_steps_per_second": 3.316,
2182
+ "step": 1300
2183
+ },
2184
+ {
2185
+ "epoch": 2.7427375032713948,
2186
+ "grad_norm": 33.466317771235126,
2187
+ "learning_rate": 2.165178958628744e-08,
2188
+ "logits/chosen": -0.2976987659931183,
2189
+ "logits/rejected": -0.2995850145816803,
2190
+ "logps/chosen": -273.274658203125,
2191
+ "logps/rejected": -265.376953125,
2192
+ "loss": 0.4557,
2193
+ "rewards/accuracies": 0.8187500238418579,
2194
+ "rewards/chosen": 0.248153418302536,
2195
+ "rewards/margins": 0.9688236117362976,
2196
+ "rewards/rejected": -0.720670223236084,
2197
+ "step": 1310
2198
+ },
2199
+ {
2200
+ "epoch": 2.7636744307772836,
2201
+ "grad_norm": 34.66631251738303,
2202
+ "learning_rate": 1.824189659787284e-08,
2203
+ "logits/chosen": -0.10840437561273575,
2204
+ "logits/rejected": -0.2128714621067047,
2205
+ "logps/chosen": -305.4066162109375,
2206
+ "logps/rejected": -265.4433288574219,
2207
+ "loss": 0.4569,
2208
+ "rewards/accuracies": 0.7875000238418579,
2209
+ "rewards/chosen": 0.2427886724472046,
2210
+ "rewards/margins": 0.9924157857894897,
2211
+ "rewards/rejected": -0.7496271133422852,
2212
+ "step": 1320
2213
+ },
2214
+ {
2215
+ "epoch": 2.784611358283172,
2216
+ "grad_norm": 32.12579125137935,
2217
+ "learning_rate": 1.511904869454772e-08,
2218
+ "logits/chosen": -0.2719363570213318,
2219
+ "logits/rejected": -0.282664954662323,
2220
+ "logps/chosen": -258.68927001953125,
2221
+ "logps/rejected": -247.0824737548828,
2222
+ "loss": 0.4442,
2223
+ "rewards/accuracies": 0.800000011920929,
2224
+ "rewards/chosen": 0.14507904648780823,
2225
+ "rewards/margins": 0.9792217016220093,
2226
+ "rewards/rejected": -0.8341425657272339,
2227
+ "step": 1330
2228
+ },
2229
+ {
2230
+ "epoch": 2.8055482857890603,
2231
+ "grad_norm": 30.777443115008786,
2232
+ "learning_rate": 1.2285106557296476e-08,
2233
+ "logits/chosen": -0.005404007621109486,
2234
+ "logits/rejected": -0.11788828670978546,
2235
+ "logps/chosen": -291.2290954589844,
2236
+ "logps/rejected": -259.48468017578125,
2237
+ "loss": 0.4343,
2238
+ "rewards/accuracies": 0.856249988079071,
2239
+ "rewards/chosen": 0.14346763491630554,
2240
+ "rewards/margins": 0.9186245203018188,
2241
+ "rewards/rejected": -0.7751568555831909,
2242
+ "step": 1340
2243
+ },
2244
+ {
2245
+ "epoch": 2.8264852132949487,
2246
+ "grad_norm": 30.795896785948266,
2247
+ "learning_rate": 9.741758728888217e-09,
2248
+ "logits/chosen": -0.21444204449653625,
2249
+ "logits/rejected": -0.22242781519889832,
2250
+ "logps/chosen": -273.7785949707031,
2251
+ "logps/rejected": -255.1972198486328,
2252
+ "loss": 0.4474,
2253
+ "rewards/accuracies": 0.800000011920929,
2254
+ "rewards/chosen": 0.1873357743024826,
2255
+ "rewards/margins": 0.925756573677063,
2256
+ "rewards/rejected": -0.7384207844734192,
2257
+ "step": 1350
2258
+ },
2259
+ {
2260
+ "epoch": 2.8474221408008376,
2261
+ "grad_norm": 32.2751606467674,
2262
+ "learning_rate": 7.490520607794981e-09,
2263
+ "logits/chosen": -0.23268666863441467,
2264
+ "logits/rejected": -0.24138864874839783,
2265
+ "logps/chosen": -321.114990234375,
2266
+ "logps/rejected": -286.67596435546875,
2267
+ "loss": 0.4185,
2268
+ "rewards/accuracies": 0.824999988079071,
2269
+ "rewards/chosen": 0.2826289236545563,
2270
+ "rewards/margins": 1.1657564640045166,
2271
+ "rewards/rejected": -0.8831275701522827,
2272
+ "step": 1360
2273
+ },
2274
+ {
2275
+ "epoch": 2.868359068306726,
2276
+ "grad_norm": 35.16124310393445,
2277
+ "learning_rate": 5.532733545274781e-09,
2278
+ "logits/chosen": -0.15783634781837463,
2279
+ "logits/rejected": -0.04948469251394272,
2280
+ "logps/chosen": -257.70880126953125,
2281
+ "logps/rejected": -257.0361633300781,
2282
+ "loss": 0.4388,
2283
+ "rewards/accuracies": 0.78125,
2284
+ "rewards/chosen": 0.09304230660200119,
2285
+ "rewards/margins": 0.8344671130180359,
2286
+ "rewards/rejected": -0.7414248585700989,
2287
+ "step": 1370
2288
+ },
2289
+ {
2290
+ "epoch": 2.8892959958126143,
2291
+ "grad_norm": 28.606615842220055,
2292
+ "learning_rate": 3.869564046156459e-09,
2293
+ "logits/chosen": -0.187531977891922,
2294
+ "logits/rejected": -0.3003791868686676,
2295
+ "logps/chosen": -326.945068359375,
2296
+ "logps/rejected": -275.7688903808594,
2297
+ "loss": 0.4474,
2298
+ "rewards/accuracies": 0.7875000238418579,
2299
+ "rewards/chosen": 0.3150181770324707,
2300
+ "rewards/margins": 0.9270998239517212,
2301
+ "rewards/rejected": -0.6120817065238953,
2302
+ "step": 1380
2303
+ },
2304
+ {
2305
+ "epoch": 2.910232923318503,
2306
+ "grad_norm": 29.689707753715854,
2307
+ "learning_rate": 2.5020030738031052e-09,
2308
+ "logits/chosen": -0.06661088764667511,
2309
+ "logits/rejected": -0.03964465484023094,
2310
+ "logps/chosen": -326.2409973144531,
2311
+ "logps/rejected": -333.43048095703125,
2312
+ "loss": 0.4429,
2313
+ "rewards/accuracies": 0.8374999761581421,
2314
+ "rewards/chosen": 0.26736724376678467,
2315
+ "rewards/margins": 1.0932358503341675,
2316
+ "rewards/rejected": -0.8258686065673828,
2317
+ "step": 1390
2318
+ },
2319
+ {
2320
+ "epoch": 2.9311698508243915,
2321
+ "grad_norm": 28.977571013449776,
2322
+ "learning_rate": 1.4308654596684177e-09,
2323
+ "logits/chosen": -0.11069446802139282,
2324
+ "logits/rejected": -0.20247487723827362,
2325
+ "logps/chosen": -306.44610595703125,
2326
+ "logps/rejected": -266.9939270019531,
2327
+ "loss": 0.4454,
2328
+ "rewards/accuracies": 0.8187500238418579,
2329
+ "rewards/chosen": 0.33073872327804565,
2330
+ "rewards/margins": 1.0625050067901611,
2331
+ "rewards/rejected": -0.7317661046981812,
2332
+ "step": 1400
2333
+ },
2334
+ {
2335
+ "epoch": 2.9311698508243915,
2336
+ "eval_logits/chosen": -0.34011754393577576,
2337
+ "eval_logits/rejected": -0.36805158853530884,
2338
+ "eval_logps/chosen": -310.2955627441406,
2339
+ "eval_logps/rejected": -275.4775085449219,
2340
+ "eval_loss": 0.5941523313522339,
2341
+ "eval_rewards/accuracies": 0.6944444179534912,
2342
+ "eval_rewards/chosen": -0.005565390922129154,
2343
+ "eval_rewards/margins": 0.591437041759491,
2344
+ "eval_rewards/rejected": -0.5970024466514587,
2345
+ "eval_runtime": 19.2489,
2346
+ "eval_samples_per_second": 103.902,
2347
+ "eval_steps_per_second": 3.273,
2348
+ "step": 1400
2349
+ },
2350
+ {
2351
+ "epoch": 2.95210677833028,
2352
+ "grad_norm": 29.74874723420356,
2353
+ "learning_rate": 6.567894177967325e-10,
2354
+ "logits/chosen": -0.046352408826351166,
2355
+ "logits/rejected": -0.054122112691402435,
2356
+ "logps/chosen": -287.88287353515625,
2357
+ "logps/rejected": -262.4837646484375,
2358
+ "loss": 0.4161,
2359
+ "rewards/accuracies": 0.7875000238418579,
2360
+ "rewards/chosen": 0.250956654548645,
2361
+ "rewards/margins": 1.026626706123352,
2362
+ "rewards/rejected": -0.775670051574707,
2363
+ "step": 1410
2364
+ },
2365
+ {
2366
+ "epoch": 2.9730437058361687,
2367
+ "grad_norm": 29.04099364011046,
2368
+ "learning_rate": 1.802361645573125e-10,
2369
+ "logits/chosen": -0.15342023968696594,
2370
+ "logits/rejected": -0.1559501588344574,
2371
+ "logps/chosen": -272.7152404785156,
2372
+ "logps/rejected": -269.2229919433594,
2373
+ "loss": 0.4192,
2374
+ "rewards/accuracies": 0.824999988079071,
2375
+ "rewards/chosen": 0.05591464787721634,
2376
+ "rewards/margins": 1.0562889575958252,
2377
+ "rewards/rejected": -1.000374436378479,
2378
+ "step": 1420
2379
+ },
2380
+ {
2381
+ "epoch": 2.993980633342057,
2382
+ "grad_norm": 30.18988447303673,
2383
+ "learning_rate": 1.4896438384481846e-12,
2384
+ "logits/chosen": -0.20836491882801056,
2385
+ "logits/rejected": -0.19784407317638397,
2386
+ "logps/chosen": -317.96209716796875,
2387
+ "logps/rejected": -302.5794982910156,
2388
+ "loss": 0.4315,
2389
+ "rewards/accuracies": 0.78125,
2390
+ "rewards/chosen": 0.22425825893878937,
2391
+ "rewards/margins": 0.8794215321540833,
2392
+ "rewards/rejected": -0.6551632285118103,
2393
+ "step": 1430
2394
+ },
2395
+ {
2396
+ "epoch": 2.996074326092646,
2397
+ "step": 1431,
2398
+ "total_flos": 0.0,
2399
+ "train_loss": 0.5334697115221363,
2400
+ "train_runtime": 7355.3343,
2401
+ "train_samples_per_second": 24.935,
2402
+ "train_steps_per_second": 0.195
2403
+ }
2404
+ ],
2405
+ "logging_steps": 10,
2406
+ "max_steps": 1431,
2407
+ "num_input_tokens_seen": 0,
2408
+ "num_train_epochs": 3,
2409
+ "save_steps": 500,
2410
+ "stateful_callbacks": {
2411
+ "TrainerControl": {
2412
+ "args": {
2413
+ "should_epoch_stop": false,
2414
+ "should_evaluate": false,
2415
+ "should_log": false,
2416
+ "should_save": false,
2417
+ "should_training_stop": false
2418
+ },
2419
+ "attributes": {}
2420
+ }
2421
+ },
2422
+ "total_flos": 0.0,
2423
+ "train_batch_size": 2,
2424
+ "trial_name": null,
2425
+ "trial_params": null
2426
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7649586c424c337f6c403fdb617ac9d954daf9a7192f3afe5b6318f37e9bb19e
3
+ size 6520
vocab.json ADDED
The diff for this file is too large to render. See raw diff