bimabk commited on
Commit
37821fe
·
verified ·
1 Parent(s): b5f5e06

Upload task output 1

Browse files
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: None
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:/cache/models/Qwen--Qwen2-7B-Instruct
7
+ - grpo
8
+ - lora
9
+ - transformers
10
+ - trl
11
+ ---
12
+
13
+ # Model Card for Model ID
14
+
15
+ <!-- Provide a quick summary of what the model is/does. -->
16
+
17
+
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ <!-- Provide a longer summary of what this model is. -->
24
+
25
+
26
+
27
+ - **Developed by:** [More Information Needed]
28
+ - **Funded by [optional]:** [More Information Needed]
29
+ - **Shared by [optional]:** [More Information Needed]
30
+ - **Model type:** [More Information Needed]
31
+ - **Language(s) (NLP):** [More Information Needed]
32
+ - **License:** [More Information Needed]
33
+ - **Finetuned from model [optional]:** [More Information Needed]
34
+
35
+ ### Model Sources [optional]
36
+
37
+ <!-- Provide the basic links for the model. -->
38
+
39
+ - **Repository:** [More Information Needed]
40
+ - **Paper [optional]:** [More Information Needed]
41
+ - **Demo [optional]:** [More Information Needed]
42
+
43
+ ## Uses
44
+
45
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
46
+
47
+ ### Direct Use
48
+
49
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
50
+
51
+ [More Information Needed]
52
+
53
+ ### Downstream Use [optional]
54
+
55
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
56
+
57
+ [More Information Needed]
58
+
59
+ ### Out-of-Scope Use
60
+
61
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
+
63
+ [More Information Needed]
64
+
65
+ ## Bias, Risks, and Limitations
66
+
67
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
68
+
69
+ [More Information Needed]
70
+
71
+ ### Recommendations
72
+
73
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
74
+
75
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
76
+
77
+ ## How to Get Started with the Model
78
+
79
+ Use the code below to get started with the model.
80
+
81
+ [More Information Needed]
82
+
83
+ ## Training Details
84
+
85
+ ### Training Data
86
+
87
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
+
89
+ [More Information Needed]
90
+
91
+ ### Training Procedure
92
+
93
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
+
95
+ #### Preprocessing [optional]
96
+
97
+ [More Information Needed]
98
+
99
+
100
+ #### Training Hyperparameters
101
+
102
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
+
104
+ #### Speeds, Sizes, Times [optional]
105
+
106
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
107
+
108
+ [More Information Needed]
109
+
110
+ ## Evaluation
111
+
112
+ <!-- This section describes the evaluation protocols and provides the results. -->
113
+
114
+ ### Testing Data, Factors & Metrics
115
+
116
+ #### Testing Data
117
+
118
+ <!-- This should link to a Dataset Card if possible. -->
119
+
120
+ [More Information Needed]
121
+
122
+ #### Factors
123
+
124
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
125
+
126
+ [More Information Needed]
127
+
128
+ #### Metrics
129
+
130
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
131
+
132
+ [More Information Needed]
133
+
134
+ ### Results
135
+
136
+ [More Information Needed]
137
+
138
+ #### Summary
139
+
140
+
141
+
142
+ ## Model Examination [optional]
143
+
144
+ <!-- Relevant interpretability work for the model goes here -->
145
+
146
+ [More Information Needed]
147
+
148
+ ## Environmental Impact
149
+
150
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
151
+
152
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
153
+
154
+ - **Hardware Type:** [More Information Needed]
155
+ - **Hours used:** [More Information Needed]
156
+ - **Cloud Provider:** [More Information Needed]
157
+ - **Compute Region:** [More Information Needed]
158
+ - **Carbon Emitted:** [More Information Needed]
159
+
160
+ ## Technical Specifications [optional]
161
+
162
+ ### Model Architecture and Objective
163
+
164
+ [More Information Needed]
165
+
166
+ ### Compute Infrastructure
167
+
168
+ [More Information Needed]
169
+
170
+ #### Hardware
171
+
172
+ [More Information Needed]
173
+
174
+ #### Software
175
+
176
+ [More Information Needed]
177
+
178
+ ## Citation [optional]
179
+
180
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
181
+
182
+ **BibTeX:**
183
+
184
+ [More Information Needed]
185
+
186
+ **APA:**
187
+
188
+ [More Information Needed]
189
+
190
+ ## Glossary [optional]
191
+
192
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
193
+
194
+ [More Information Needed]
195
+
196
+ ## More Information [optional]
197
+
198
+ [More Information Needed]
199
+
200
+ ## Model Card Authors [optional]
201
+
202
+ [More Information Needed]
203
+
204
+ ## Model Card Contact
205
+
206
+ [More Information Needed]
207
+ ### Framework versions
208
+
209
+ - PEFT 0.18.1
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": null,
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 256,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 128,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "down_proj",
34
+ "k_proj",
35
+ "v_proj",
36
+ "o_proj",
37
+ "gate_proj",
38
+ "up_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34124c2c4e5f99842557e20d65c0cf949932c5d4898a6d6474ff7ea2e30f3031
3
+ size 1291899160
loss.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ 150,-1.1500000556310017
trainer_state.json ADDED
@@ -0,0 +1,1090 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 0.024,
6
+ "eval_steps": 500,
7
+ "global_step": 150,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "clip_ratio/high_max": 0.0375,
14
+ "clip_ratio/high_mean": 0.01215277761220932,
15
+ "clip_ratio/low_mean": 0.008333333488553762,
16
+ "clip_ratio/low_min": 0.0,
17
+ "clip_ratio/region_mean": 0.020486111007630824,
18
+ "completions/clipped_ratio": 0.0,
19
+ "completions/max_length": 373.4,
20
+ "completions/max_terminated_length": 373.4,
21
+ "completions/mean_length": 297.375,
22
+ "completions/mean_terminated_length": 297.375,
23
+ "completions/min_length": 212.0,
24
+ "completions/min_terminated_length": 212.0,
25
+ "entropy": 0.46410001516342164,
26
+ "epoch": 0.0008,
27
+ "frac_reward_zero_std": 0.4,
28
+ "grad_norm": 0.03925486281514168,
29
+ "kl": 0.0062119101290591065,
30
+ "learning_rate": 8.529119999999999e-07,
31
+ "loss": 2.4803768610581757e-06,
32
+ "num_tokens": 136410.0,
33
+ "reward": 0.6898125052452088,
34
+ "reward_std": 0.297073221206665,
35
+ "rewards/env_goofspiel_reward/mean": 0.6898125052452088,
36
+ "rewards/env_goofspiel_reward/std": 0.44231168627738954,
37
+ "sampling/importance_sampling_ratio/max": 2.004665732383728,
38
+ "sampling/importance_sampling_ratio/mean": 0.3851233392953873,
39
+ "sampling/importance_sampling_ratio/min": 2.0644982578232883e-05,
40
+ "sampling/sampling_logp_difference/max": 7.081162071228027,
41
+ "sampling/sampling_logp_difference/mean": 0.6444696307182312,
42
+ "step": 5,
43
+ "step_time": 4.716039226800239
44
+ },
45
+ {
46
+ "clip_ratio/high_max": 0.07736111134290695,
47
+ "clip_ratio/high_mean": 0.023428030125796796,
48
+ "clip_ratio/low_mean": 0.01346275256946683,
49
+ "clip_ratio/low_min": 0.0,
50
+ "clip_ratio/region_mean": 0.03689078260213137,
51
+ "completions/clipped_ratio": 0.0,
52
+ "completions/max_length": 374.4,
53
+ "completions/max_terminated_length": 374.4,
54
+ "completions/mean_length": 291.68125,
55
+ "completions/mean_terminated_length": 291.68125,
56
+ "completions/min_length": 212.0,
57
+ "completions/min_terminated_length": 212.0,
58
+ "entropy": 0.4338525801897049,
59
+ "epoch": 0.0016,
60
+ "frac_reward_zero_std": 0.35,
61
+ "grad_norm": 0.10885529965162277,
62
+ "kl": 0.012050760450074449,
63
+ "learning_rate": 1.919052e-06,
64
+ "loss": 0.00018108433578163386,
65
+ "num_tokens": 270498.0,
66
+ "reward": 0.7646875262260437,
67
+ "reward_std": 0.32924659848213195,
68
+ "rewards/env_goofspiel_reward/mean": 0.7646875262260437,
69
+ "rewards/env_goofspiel_reward/std": 0.44543521404266356,
70
+ "sampling/importance_sampling_ratio/max": 2.0047454595565797,
71
+ "sampling/importance_sampling_ratio/mean": 0.39372612833976744,
72
+ "sampling/importance_sampling_ratio/min": 1.5255385369528085e-05,
73
+ "sampling/sampling_logp_difference/max": 7.504581451416016,
74
+ "sampling/sampling_logp_difference/mean": 0.7591853618621827,
75
+ "step": 10,
76
+ "step_time": 4.234808461799912
77
+ },
78
+ {
79
+ "clip_ratio/high_max": 0.06954545490443706,
80
+ "clip_ratio/high_mean": 0.023011363483965395,
81
+ "clip_ratio/low_mean": 0.022291666734963654,
82
+ "clip_ratio/low_min": 0.0,
83
+ "clip_ratio/region_mean": 0.04530302993953228,
84
+ "completions/clipped_ratio": 0.0,
85
+ "completions/max_length": 373.6,
86
+ "completions/max_terminated_length": 373.6,
87
+ "completions/mean_length": 283.5,
88
+ "completions/mean_terminated_length": 283.5,
89
+ "completions/min_length": 202.0,
90
+ "completions/min_terminated_length": 202.0,
91
+ "entropy": 0.4219451487064362,
92
+ "epoch": 0.0024,
93
+ "frac_reward_zero_std": 0.425,
94
+ "grad_norm": 0.06103011220693588,
95
+ "kl": 0.04142252554884181,
96
+ "learning_rate": 2.985192e-06,
97
+ "loss": 0.0005043631885200739,
98
+ "num_tokens": 403004.0,
99
+ "reward": 0.7495625257492066,
100
+ "reward_std": 0.286996965110302,
101
+ "rewards/env_goofspiel_reward/mean": 0.7495625257492066,
102
+ "rewards/env_goofspiel_reward/std": 0.43196319937705996,
103
+ "sampling/importance_sampling_ratio/max": 2.247171688079834,
104
+ "sampling/importance_sampling_ratio/mean": 0.4383248031139374,
105
+ "sampling/importance_sampling_ratio/min": 2.4881603894755245e-05,
106
+ "sampling/sampling_logp_difference/max": 7.91492919921875,
107
+ "sampling/sampling_logp_difference/mean": 0.7775012850761414,
108
+ "step": 15,
109
+ "step_time": 4.656600760800211
110
+ },
111
+ {
112
+ "clip_ratio/high_max": 0.05416666679084301,
113
+ "clip_ratio/high_mean": 0.013541666697710753,
114
+ "clip_ratio/low_mean": 0.014444444514811038,
115
+ "clip_ratio/low_min": 0.0,
116
+ "clip_ratio/region_mean": 0.027986111119389534,
117
+ "completions/clipped_ratio": 0.0,
118
+ "completions/max_length": 373.8,
119
+ "completions/max_terminated_length": 373.8,
120
+ "completions/mean_length": 281.39375,
121
+ "completions/mean_terminated_length": 281.39375,
122
+ "completions/min_length": 212.0,
123
+ "completions/min_terminated_length": 212.0,
124
+ "entropy": 0.38834676444530486,
125
+ "epoch": 0.0032,
126
+ "frac_reward_zero_std": 0.425,
127
+ "grad_norm": 0.06636141240596771,
128
+ "kl": 0.24065511422231795,
129
+ "learning_rate": 4.051332e-06,
130
+ "loss": -0.00033730233553797007,
131
+ "num_tokens": 535785.0,
132
+ "reward": 0.8736250162124634,
133
+ "reward_std": 0.29185832142829893,
134
+ "rewards/env_goofspiel_reward/mean": 0.8736250162124634,
135
+ "rewards/env_goofspiel_reward/std": 0.4162306308746338,
136
+ "sampling/importance_sampling_ratio/max": 2.5457701206207277,
137
+ "sampling/importance_sampling_ratio/mean": 0.43321084380149844,
138
+ "sampling/importance_sampling_ratio/min": 3.2646653380652424e-07,
139
+ "sampling/sampling_logp_difference/max": 8.767316246032715,
140
+ "sampling/sampling_logp_difference/mean": 0.715557587146759,
141
+ "step": 20,
142
+ "step_time": 4.1054079593999635
143
+ },
144
+ {
145
+ "clip_ratio/high_max": 0.05347222238779068,
146
+ "clip_ratio/high_mean": 0.01631944440305233,
147
+ "clip_ratio/low_mean": 0.027108585741370917,
148
+ "clip_ratio/low_min": 0.0,
149
+ "clip_ratio/region_mean": 0.043428029771894215,
150
+ "completions/clipped_ratio": 0.0,
151
+ "completions/max_length": 373.8,
152
+ "completions/max_terminated_length": 373.8,
153
+ "completions/mean_length": 300.24375,
154
+ "completions/mean_terminated_length": 300.24375,
155
+ "completions/min_length": 212.6,
156
+ "completions/min_terminated_length": 212.6,
157
+ "entropy": 0.3262263901531696,
158
+ "epoch": 0.004,
159
+ "frac_reward_zero_std": 0.4125,
160
+ "grad_norm": 0.04734628647565842,
161
+ "kl": 0.5914028726518155,
162
+ "learning_rate": 5.117472e-06,
163
+ "loss": -0.0011067342013120652,
164
+ "num_tokens": 674417.0,
165
+ "reward": 0.7871875405311585,
166
+ "reward_std": 0.31828643679618834,
167
+ "rewards/env_goofspiel_reward/mean": 0.7871875405311585,
168
+ "rewards/env_goofspiel_reward/std": 0.42372027039527893,
169
+ "sampling/importance_sampling_ratio/max": 1.7681864023208618,
170
+ "sampling/importance_sampling_ratio/mean": 0.33697319626808164,
171
+ "sampling/importance_sampling_ratio/min": 5.1189550868002696e-05,
172
+ "sampling/sampling_logp_difference/max": 8.77573595046997,
173
+ "sampling/sampling_logp_difference/mean": 0.7286012172698975,
174
+ "step": 25,
175
+ "step_time": 4.131656074401144
176
+ },
177
+ {
178
+ "clip_ratio/high_max": 0.03625000007450581,
179
+ "clip_ratio/high_mean": 0.009062500018626452,
180
+ "clip_ratio/low_mean": 0.02396990731358528,
181
+ "clip_ratio/low_min": 0.0,
182
+ "clip_ratio/region_mean": 0.03303240723907948,
183
+ "completions/clipped_ratio": 0.0,
184
+ "completions/max_length": 374.0,
185
+ "completions/max_terminated_length": 374.0,
186
+ "completions/mean_length": 283.6,
187
+ "completions/mean_terminated_length": 283.6,
188
+ "completions/min_length": 212.0,
189
+ "completions/min_terminated_length": 212.0,
190
+ "entropy": 0.2808014929294586,
191
+ "epoch": 0.0048,
192
+ "frac_reward_zero_std": 0.425,
193
+ "grad_norm": 0.10528500378131866,
194
+ "kl": 1.774285883270204,
195
+ "learning_rate": 6.183612e-06,
196
+ "loss": -0.0006869177334010601,
197
+ "num_tokens": 806438.0,
198
+ "reward": 0.8661875247955322,
199
+ "reward_std": 0.3341963529586792,
200
+ "rewards/env_goofspiel_reward/mean": 0.8661875247955322,
201
+ "rewards/env_goofspiel_reward/std": 0.4470097303390503,
202
+ "sampling/importance_sampling_ratio/max": 1.7913244009017943,
203
+ "sampling/importance_sampling_ratio/mean": 0.46525874733924866,
204
+ "sampling/importance_sampling_ratio/min": 5.821734957862645e-05,
205
+ "sampling/sampling_logp_difference/max": 7.922852897644043,
206
+ "sampling/sampling_logp_difference/mean": 0.6293025732040405,
207
+ "step": 30,
208
+ "step_time": 4.5387513689995105
209
+ },
210
+ {
211
+ "clip_ratio/high_max": 0.06666666679084302,
212
+ "clip_ratio/high_mean": 0.02274305550381541,
213
+ "clip_ratio/low_mean": 0.02211805563420057,
214
+ "clip_ratio/low_min": 0.0,
215
+ "clip_ratio/region_mean": 0.04486111104488373,
216
+ "completions/clipped_ratio": 0.0,
217
+ "completions/max_length": 365.6,
218
+ "completions/max_terminated_length": 365.6,
219
+ "completions/mean_length": 289.15625,
220
+ "completions/mean_terminated_length": 289.15625,
221
+ "completions/min_length": 202.0,
222
+ "completions/min_terminated_length": 202.0,
223
+ "entropy": 0.37011782824993134,
224
+ "epoch": 0.0056,
225
+ "frac_reward_zero_std": 0.375,
226
+ "grad_norm": 0.1105622872710228,
227
+ "kl": 0.8004692852497101,
228
+ "learning_rate": 7.249752e-06,
229
+ "loss": -0.0004383428022265434,
230
+ "num_tokens": 941248.0,
231
+ "reward": 0.8023125410079956,
232
+ "reward_std": 0.350283020734787,
233
+ "rewards/env_goofspiel_reward/mean": 0.8023125410079956,
234
+ "rewards/env_goofspiel_reward/std": 0.4369929492473602,
235
+ "sampling/importance_sampling_ratio/max": 1.9201649188995362,
236
+ "sampling/importance_sampling_ratio/mean": 0.4281450629234314,
237
+ "sampling/importance_sampling_ratio/min": 0.0001282373646972701,
238
+ "sampling/sampling_logp_difference/max": 7.672937965393066,
239
+ "sampling/sampling_logp_difference/mean": 0.6862899184226989,
240
+ "step": 35,
241
+ "step_time": 3.9215637991999754
242
+ },
243
+ {
244
+ "clip_ratio/high_max": 0.052916666865348815,
245
+ "clip_ratio/high_mean": 0.013229166716337204,
246
+ "clip_ratio/low_mean": 0.01652777772396803,
247
+ "clip_ratio/low_min": 0.0,
248
+ "clip_ratio/region_mean": 0.029756944440305234,
249
+ "completions/clipped_ratio": 0.0,
250
+ "completions/max_length": 374.8,
251
+ "completions/max_terminated_length": 374.8,
252
+ "completions/mean_length": 290.475,
253
+ "completions/mean_terminated_length": 290.475,
254
+ "completions/min_length": 212.0,
255
+ "completions/min_terminated_length": 212.0,
256
+ "entropy": 0.45398128926754,
257
+ "epoch": 0.0064,
258
+ "frac_reward_zero_std": 0.2625,
259
+ "grad_norm": 0.05981823429465294,
260
+ "kl": 0.35742565132677556,
261
+ "learning_rate": 7.4629793691100655e-06,
262
+ "loss": -0.00020290752872824668,
263
+ "num_tokens": 1075889.0,
264
+ "reward": 0.7535625219345092,
265
+ "reward_std": 0.4086193323135376,
266
+ "rewards/env_goofspiel_reward/mean": 0.7535625219345092,
267
+ "rewards/env_goofspiel_reward/std": 0.48414809107780454,
268
+ "sampling/importance_sampling_ratio/max": 2.2908178567886353,
269
+ "sampling/importance_sampling_ratio/mean": 0.4616763234138489,
270
+ "sampling/importance_sampling_ratio/min": 0.00017085522413253784,
271
+ "sampling/sampling_logp_difference/max": 6.5542881965637205,
272
+ "sampling/sampling_logp_difference/mean": 0.6057502806186676,
273
+ "step": 40,
274
+ "step_time": 3.9750062072005674
275
+ },
276
+ {
277
+ "clip_ratio/high_max": 0.03541666679084301,
278
+ "clip_ratio/high_mean": 0.010416666697710752,
279
+ "clip_ratio/low_mean": 0.021631944458931684,
280
+ "clip_ratio/low_min": 0.0,
281
+ "clip_ratio/region_mean": 0.03204861097037792,
282
+ "completions/clipped_ratio": 0.0,
283
+ "completions/max_length": 375.4,
284
+ "completions/max_terminated_length": 375.4,
285
+ "completions/mean_length": 298.0375,
286
+ "completions/mean_terminated_length": 298.0375,
287
+ "completions/min_length": 212.0,
288
+ "completions/min_terminated_length": 212.0,
289
+ "entropy": 0.4585177510976791,
290
+ "epoch": 0.0072,
291
+ "frac_reward_zero_std": 0.4125,
292
+ "grad_norm": 0.023925846442580223,
293
+ "kl": 0.2792994946241379,
294
+ "learning_rate": 7.462976806120193e-06,
295
+ "loss": -0.0006944169290363789,
296
+ "num_tokens": 1212465.0,
297
+ "reward": 0.8323750257492065,
298
+ "reward_std": 0.30776822566986084,
299
+ "rewards/env_goofspiel_reward/mean": 0.8323750257492065,
300
+ "rewards/env_goofspiel_reward/std": 0.45529221296310424,
301
+ "sampling/importance_sampling_ratio/max": 2.1405240535736083,
302
+ "sampling/importance_sampling_ratio/mean": 0.47331286072731016,
303
+ "sampling/importance_sampling_ratio/min": 0.00027849085163325074,
304
+ "sampling/sampling_logp_difference/max": 6.536230754852295,
305
+ "sampling/sampling_logp_difference/mean": 0.6047793924808502,
306
+ "step": 45,
307
+ "step_time": 4.178844353599925
308
+ },
309
+ {
310
+ "clip_ratio/high_max": 0.03611111119389534,
311
+ "clip_ratio/high_mean": 0.01152777774259448,
312
+ "clip_ratio/low_mean": 0.014851641468703746,
313
+ "clip_ratio/low_min": 0.0,
314
+ "clip_ratio/region_mean": 0.026379418931901454,
315
+ "completions/clipped_ratio": 0.0,
316
+ "completions/max_length": 374.6,
317
+ "completions/max_terminated_length": 374.6,
318
+ "completions/mean_length": 299.5375,
319
+ "completions/mean_terminated_length": 299.5375,
320
+ "completions/min_length": 212.0,
321
+ "completions/min_terminated_length": 212.0,
322
+ "entropy": 0.32481125816702844,
323
+ "epoch": 0.008,
324
+ "frac_reward_zero_std": 0.45,
325
+ "grad_norm": 0.04731212928891182,
326
+ "kl": 0.9624257631599903,
327
+ "learning_rate": 7.4629722716015665e-06,
328
+ "loss": -0.000700222747400403,
329
+ "num_tokens": 1350400.0,
330
+ "reward": 0.9035625338554383,
331
+ "reward_std": 0.2705567300319672,
332
+ "rewards/env_goofspiel_reward/mean": 0.9035625338554383,
333
+ "rewards/env_goofspiel_reward/std": 0.3961623251438141,
334
+ "sampling/importance_sampling_ratio/max": 2.3192288398742678,
335
+ "sampling/importance_sampling_ratio/mean": 0.5576886355876922,
336
+ "sampling/importance_sampling_ratio/min": 0.0003088584111537784,
337
+ "sampling/sampling_logp_difference/max": 5.365625381469727,
338
+ "sampling/sampling_logp_difference/mean": 0.5096056282520294,
339
+ "step": 50,
340
+ "step_time": 4.218161174600027
341
+ },
342
+ {
343
+ "clip_ratio/high_max": 0.0125,
344
+ "clip_ratio/high_mean": 0.003125,
345
+ "clip_ratio/low_mean": 0.012881944794207812,
346
+ "clip_ratio/low_min": 0.0,
347
+ "clip_ratio/region_mean": 0.016006944794207813,
348
+ "completions/clipped_ratio": 0.0,
349
+ "completions/max_length": 374.6,
350
+ "completions/max_terminated_length": 374.6,
351
+ "completions/mean_length": 284.55625,
352
+ "completions/mean_terminated_length": 284.55625,
353
+ "completions/min_length": 212.0,
354
+ "completions/min_terminated_length": 212.0,
355
+ "entropy": 0.25873188972473143,
356
+ "epoch": 0.0088,
357
+ "frac_reward_zero_std": 0.575,
358
+ "grad_norm": 0.03882903605699539,
359
+ "kl": 1.3922856956720353,
360
+ "learning_rate": 7.4629657655573805e-06,
361
+ "loss": -0.00034891823306679723,
362
+ "num_tokens": 1482932.0,
363
+ "reward": 0.9674375414848327,
364
+ "reward_std": 0.22282702028751372,
365
+ "rewards/env_goofspiel_reward/mean": 0.9674375414848327,
366
+ "rewards/env_goofspiel_reward/std": 0.3725932538509369,
367
+ "sampling/importance_sampling_ratio/max": 2.143031930923462,
368
+ "sampling/importance_sampling_ratio/mean": 0.6658945798873901,
369
+ "sampling/importance_sampling_ratio/min": 0.0010862916285987012,
370
+ "sampling/sampling_logp_difference/max": 7.023790454864502,
371
+ "sampling/sampling_logp_difference/mean": 0.4928452789783478,
372
+ "step": 55,
373
+ "step_time": 3.999122467400048
374
+ },
375
+ {
376
+ "clip_ratio/high_max": 0.00625,
377
+ "clip_ratio/high_mean": 0.0015625,
378
+ "clip_ratio/low_mean": 0.006613005138933659,
379
+ "clip_ratio/low_min": 0.0,
380
+ "clip_ratio/region_mean": 0.00817550513893366,
381
+ "completions/clipped_ratio": 0.0,
382
+ "completions/max_length": 374.4,
383
+ "completions/max_terminated_length": 374.4,
384
+ "completions/mean_length": 294.025,
385
+ "completions/mean_terminated_length": 294.025,
386
+ "completions/min_length": 218.8,
387
+ "completions/min_terminated_length": 218.8,
388
+ "entropy": 0.25592469796538353,
389
+ "epoch": 0.0096,
390
+ "frac_reward_zero_std": 0.6875,
391
+ "grad_norm": 0.011030570603907108,
392
+ "kl": 0.5781544581055641,
393
+ "learning_rate": 7.462957287992218e-06,
394
+ "loss": -0.0010543103329837323,
395
+ "num_tokens": 1618242.0,
396
+ "reward": 1.0612500429153442,
397
+ "reward_std": 0.15379572063684463,
398
+ "rewards/env_goofspiel_reward/mean": 1.0612500429153442,
399
+ "rewards/env_goofspiel_reward/std": 0.3027446687221527,
400
+ "sampling/importance_sampling_ratio/max": 2.451231026649475,
401
+ "sampling/importance_sampling_ratio/mean": 0.6636472702026367,
402
+ "sampling/importance_sampling_ratio/min": 0.0002636277698911726,
403
+ "sampling/sampling_logp_difference/max": 5.692580604553223,
404
+ "sampling/sampling_logp_difference/mean": 0.41186076402664185,
405
+ "step": 60,
406
+ "step_time": 4.0995516278006106
407
+ },
408
+ {
409
+ "clip_ratio/high_max": 0.02361111119389534,
410
+ "clip_ratio/high_mean": 0.005902777798473835,
411
+ "clip_ratio/low_mean": 0.0078125,
412
+ "clip_ratio/low_min": 0.0,
413
+ "clip_ratio/region_mean": 0.013715277798473835,
414
+ "completions/clipped_ratio": 0.0,
415
+ "completions/max_length": 373.8,
416
+ "completions/max_terminated_length": 373.8,
417
+ "completions/mean_length": 273.5375,
418
+ "completions/mean_terminated_length": 273.5375,
419
+ "completions/min_length": 212.0,
420
+ "completions/min_terminated_length": 212.0,
421
+ "entropy": 0.2160419549793005,
422
+ "epoch": 0.0104,
423
+ "frac_reward_zero_std": 0.6375,
424
+ "grad_norm": 0.03689345344901085,
425
+ "kl": 0.7355392906814814,
426
+ "learning_rate": 7.462946838912051e-06,
427
+ "loss": -0.0004133625887334347,
428
+ "num_tokens": 1747425.0,
429
+ "reward": 1.0386249899864197,
430
+ "reward_std": 0.207005512714386,
431
+ "rewards/env_goofspiel_reward/mean": 1.0386249899864197,
432
+ "rewards/env_goofspiel_reward/std": 0.34287082552909853,
433
+ "sampling/importance_sampling_ratio/max": 1.7854061126708984,
434
+ "sampling/importance_sampling_ratio/mean": 0.7041799902915955,
435
+ "sampling/importance_sampling_ratio/min": 0.002913042064756155,
436
+ "sampling/sampling_logp_difference/max": 5.417265224456787,
437
+ "sampling/sampling_logp_difference/mean": 0.3267530858516693,
438
+ "step": 65,
439
+ "step_time": 4.27056002979989
440
+ },
441
+ {
442
+ "clip_ratio/high_max": 0.00625,
443
+ "clip_ratio/high_mean": 0.0015625,
444
+ "clip_ratio/low_mean": 0.010138888843357563,
445
+ "clip_ratio/low_min": 0.0,
446
+ "clip_ratio/region_mean": 0.011701388843357563,
447
+ "completions/clipped_ratio": 0.0,
448
+ "completions/max_length": 374.8,
449
+ "completions/max_terminated_length": 374.8,
450
+ "completions/mean_length": 281.0875,
451
+ "completions/mean_terminated_length": 281.0875,
452
+ "completions/min_length": 212.0,
453
+ "completions/min_terminated_length": 212.0,
454
+ "entropy": 0.20821231752634048,
455
+ "epoch": 0.0112,
456
+ "frac_reward_zero_std": 0.7125,
457
+ "grad_norm": 0.09448564797639847,
458
+ "kl": 1.5728149417787791,
459
+ "learning_rate": 7.462934418324241e-06,
460
+ "loss": -0.0004269219934940338,
461
+ "num_tokens": 1879313.0,
462
+ "reward": 1.0425000429153441,
463
+ "reward_std": 0.1484924241900444,
464
+ "rewards/env_goofspiel_reward/mean": 1.0425000429153441,
465
+ "rewards/env_goofspiel_reward/std": 0.3160957217216492,
466
+ "sampling/importance_sampling_ratio/max": 2.062333583831787,
467
+ "sampling/importance_sampling_ratio/mean": 0.6789644956588745,
468
+ "sampling/importance_sampling_ratio/min": 0.000793453273945488,
469
+ "sampling/sampling_logp_difference/max": 5.028042125701904,
470
+ "sampling/sampling_logp_difference/mean": 0.3625839054584503,
471
+ "step": 70,
472
+ "step_time": 4.002103836400238
473
+ },
474
+ {
475
+ "clip_ratio/high_max": 0.00625,
476
+ "clip_ratio/high_mean": 0.0015625,
477
+ "clip_ratio/low_mean": 0.008715277817100287,
478
+ "clip_ratio/low_min": 0.0,
479
+ "clip_ratio/region_mean": 0.010277777817100287,
480
+ "completions/clipped_ratio": 0.0,
481
+ "completions/max_length": 374.6,
482
+ "completions/max_terminated_length": 374.6,
483
+ "completions/mean_length": 292.48125,
484
+ "completions/mean_terminated_length": 292.48125,
485
+ "completions/min_length": 212.0,
486
+ "completions/min_terminated_length": 212.0,
487
+ "entropy": 0.23503879755735396,
488
+ "epoch": 0.012,
489
+ "frac_reward_zero_std": 0.675,
490
+ "grad_norm": 0.008777249604463577,
491
+ "kl": 1.0266067795455456,
492
+ "learning_rate": 7.4629200262375374e-06,
493
+ "loss": -0.000634206272661686,
494
+ "num_tokens": 2014943.0,
495
+ "reward": 1.038687562942505,
496
+ "reward_std": 0.16449071615934371,
497
+ "rewards/env_goofspiel_reward/mean": 1.038687562942505,
498
+ "rewards/env_goofspiel_reward/std": 0.3346868008375168,
499
+ "sampling/importance_sampling_ratio/max": 1.5970592021942138,
500
+ "sampling/importance_sampling_ratio/mean": 0.6036460041999817,
501
+ "sampling/importance_sampling_ratio/min": 0.00030632600537501277,
502
+ "sampling/sampling_logp_difference/max": 5.986368083953858,
503
+ "sampling/sampling_logp_difference/mean": 0.4311356723308563,
504
+ "step": 75,
505
+ "step_time": 4.108657225800198
506
+ },
507
+ {
508
+ "epoch": 0.012,
509
+ "eval_clip_ratio/high_max": 0.0,
510
+ "eval_clip_ratio/high_mean": 0.0,
511
+ "eval_clip_ratio/low_mean": 0.0,
512
+ "eval_clip_ratio/low_min": 0.0,
513
+ "eval_clip_ratio/region_mean": 0.0,
514
+ "eval_completions/clipped_ratio": 0.0,
515
+ "eval_completions/max_length": 373.0,
516
+ "eval_completions/max_terminated_length": 373.0,
517
+ "eval_completions/mean_length": 314.2916666666667,
518
+ "eval_completions/mean_terminated_length": 314.2916666666667,
519
+ "eval_completions/min_length": 263.3333333333333,
520
+ "eval_completions/min_terminated_length": 263.3333333333333,
521
+ "eval_entropy": 0.2657604416211446,
522
+ "eval_frac_reward_zero_std": 0.4166666666666667,
523
+ "eval_kl": 0.5168450077374777,
524
+ "eval_loss": -0.00045055957161821425,
525
+ "eval_num_tokens": 2014943.0,
526
+ "eval_reward": 0.99958336353302,
527
+ "eval_reward_std": 0.21272129813830057,
528
+ "eval_rewards/env_goofspiel_reward/mean": 0.99958336353302,
529
+ "eval_rewards/env_goofspiel_reward/std": 0.2815760125716527,
530
+ "eval_runtime": 2.0876,
531
+ "eval_samples_per_second": 4.79,
532
+ "eval_sampling/importance_sampling_ratio/max": 1.69784019390742,
533
+ "eval_sampling/importance_sampling_ratio/mean": 0.624309907356898,
534
+ "eval_sampling/importance_sampling_ratio/min": 0.014207058896621069,
535
+ "eval_sampling/sampling_logp_difference/max": 4.317745526631673,
536
+ "eval_sampling/sampling_logp_difference/mean": 0.3868154088656108,
537
+ "eval_steps_per_second": 0.958,
538
+ "step": 75
539
+ },
540
+ {
541
+ "clip_ratio/high_max": 0.0,
542
+ "clip_ratio/high_mean": 0.0,
543
+ "clip_ratio/low_mean": 0.012099116202443838,
544
+ "clip_ratio/low_min": 0.0,
545
+ "clip_ratio/region_mean": 0.012099116202443838,
546
+ "completions/clipped_ratio": 0.0,
547
+ "completions/max_length": 374.0,
548
+ "completions/max_terminated_length": 374.0,
549
+ "completions/mean_length": 284.125,
550
+ "completions/mean_terminated_length": 284.125,
551
+ "completions/min_length": 212.0,
552
+ "completions/min_terminated_length": 212.0,
553
+ "entropy": 0.270469605922699,
554
+ "epoch": 0.0128,
555
+ "frac_reward_zero_std": 0.6625,
556
+ "grad_norm": 0.05691204220056534,
557
+ "kl": 0.7552504394203424,
558
+ "learning_rate": 7.462903662662079e-06,
559
+ "loss": -0.0006706514395773411,
560
+ "num_tokens": 2148759.0,
561
+ "reward": 1.0424375176429748,
562
+ "reward_std": 0.18040061742067337,
563
+ "rewards/env_goofspiel_reward/mean": 1.0424375176429748,
564
+ "rewards/env_goofspiel_reward/std": 0.3348836898803711,
565
+ "sampling/importance_sampling_ratio/max": 1.9503651142120362,
566
+ "sampling/importance_sampling_ratio/mean": 0.6592345595359802,
567
+ "sampling/importance_sampling_ratio/min": 0.0007974941050633788,
568
+ "sampling/sampling_logp_difference/max": 5.421667098999023,
569
+ "sampling/sampling_logp_difference/mean": 0.37761002480983735,
570
+ "step": 80,
571
+ "step_time": 4.0779971672005555
572
+ },
573
+ {
574
+ "clip_ratio/high_max": 0.01704545468091965,
575
+ "clip_ratio/high_mean": 0.004261363670229912,
576
+ "clip_ratio/low_mean": 0.01110164150595665,
577
+ "clip_ratio/low_min": 0.0,
578
+ "clip_ratio/region_mean": 0.015363005176186561,
579
+ "completions/clipped_ratio": 0.0,
580
+ "completions/max_length": 374.0,
581
+ "completions/max_terminated_length": 374.0,
582
+ "completions/mean_length": 289.10625,
583
+ "completions/mean_terminated_length": 289.10625,
584
+ "completions/min_length": 212.0,
585
+ "completions/min_terminated_length": 212.0,
586
+ "entropy": 0.3144980549812317,
587
+ "epoch": 0.0136,
588
+ "frac_reward_zero_std": 0.55,
589
+ "grad_norm": 0.02907728962600231,
590
+ "kl": 0.603056262433529,
591
+ "learning_rate": 7.462885327609394e-06,
592
+ "loss": -0.00035388271789997815,
593
+ "num_tokens": 2282680.0,
594
+ "reward": 0.9262500286102295,
595
+ "reward_std": 0.25986174046993255,
596
+ "rewards/env_goofspiel_reward/mean": 0.9262500286102295,
597
+ "rewards/env_goofspiel_reward/std": 0.40943323373794555,
598
+ "sampling/importance_sampling_ratio/max": 2.144289803504944,
599
+ "sampling/importance_sampling_ratio/mean": 0.5303596138954163,
600
+ "sampling/importance_sampling_ratio/min": 0.0002621762232593028,
601
+ "sampling/sampling_logp_difference/max": 6.042155361175537,
602
+ "sampling/sampling_logp_difference/mean": 0.5484204053878784,
603
+ "step": 85,
604
+ "step_time": 4.207774694199543
605
+ },
606
+ {
607
+ "clip_ratio/high_max": 0.00555555559694767,
608
+ "clip_ratio/high_mean": 0.0013888888992369176,
609
+ "clip_ratio/low_mean": 0.011493055615574121,
610
+ "clip_ratio/low_min": 0.0,
611
+ "clip_ratio/region_mean": 0.012881944514811039,
612
+ "completions/clipped_ratio": 0.0,
613
+ "completions/max_length": 374.4,
614
+ "completions/max_terminated_length": 374.4,
615
+ "completions/mean_length": 298.5875,
616
+ "completions/mean_terminated_length": 298.5875,
617
+ "completions/min_length": 212.0,
618
+ "completions/min_terminated_length": 212.0,
619
+ "entropy": 0.25913119316101074,
620
+ "epoch": 0.0144,
621
+ "frac_reward_zero_std": 0.475,
622
+ "grad_norm": 0.021151067689061165,
623
+ "kl": 0.6603276126086712,
624
+ "learning_rate": 7.462865021092397e-06,
625
+ "loss": -0.0006772585213184357,
626
+ "num_tokens": 2420580.0,
627
+ "reward": 0.9559375405311584,
628
+ "reward_std": 0.28134011626243594,
629
+ "rewards/env_goofspiel_reward/mean": 0.9559375405311584,
630
+ "rewards/env_goofspiel_reward/std": 0.3986021220684052,
631
+ "sampling/importance_sampling_ratio/max": 2.0312726736068725,
632
+ "sampling/importance_sampling_ratio/mean": 0.5839850544929505,
633
+ "sampling/importance_sampling_ratio/min": 0.0004560710280202329,
634
+ "sampling/sampling_logp_difference/max": 6.067973613739014,
635
+ "sampling/sampling_logp_difference/mean": 0.4282756567001343,
636
+ "step": 90,
637
+ "step_time": 4.005906563800636
638
+ },
639
+ {
640
+ "clip_ratio/high_max": 0.01111111119389534,
641
+ "clip_ratio/high_mean": 0.002777777798473835,
642
+ "clip_ratio/low_mean": 0.013939394056797028,
643
+ "clip_ratio/low_min": 0.0,
644
+ "clip_ratio/region_mean": 0.016717171855270864,
645
+ "completions/clipped_ratio": 0.0,
646
+ "completions/max_length": 374.2,
647
+ "completions/max_terminated_length": 374.2,
648
+ "completions/mean_length": 283.1375,
649
+ "completions/mean_terminated_length": 283.1375,
650
+ "completions/min_length": 212.0,
651
+ "completions/min_terminated_length": 212.0,
652
+ "entropy": 0.2632964253425598,
653
+ "epoch": 0.0152,
654
+ "frac_reward_zero_std": 0.6125,
655
+ "grad_norm": 0.0911855697631836,
656
+ "kl": 1.6807550355792045,
657
+ "learning_rate": 7.462842743125395e-06,
658
+ "loss": -0.0009113414213061333,
659
+ "num_tokens": 2552657.0,
660
+ "reward": 1.0198750019073486,
661
+ "reward_std": 0.2229154199361801,
662
+ "rewards/env_goofspiel_reward/mean": 1.0198750019073486,
663
+ "rewards/env_goofspiel_reward/std": 0.3695839524269104,
664
+ "sampling/importance_sampling_ratio/max": 1.79537193775177,
665
+ "sampling/importance_sampling_ratio/mean": 0.5864414393901825,
666
+ "sampling/importance_sampling_ratio/min": 0.0028222970955539494,
667
+ "sampling/sampling_logp_difference/max": 6.276762676239014,
668
+ "sampling/sampling_logp_difference/mean": 0.4131439089775085,
669
+ "step": 95,
670
+ "step_time": 4.100510867999764
671
+ },
672
+ {
673
+ "clip_ratio/high_max": 0.00625,
674
+ "clip_ratio/high_mean": 0.0015625,
675
+ "clip_ratio/low_mean": 0.0072916666977107525,
676
+ "clip_ratio/low_min": 0.0,
677
+ "clip_ratio/region_mean": 0.008854166697710752,
678
+ "completions/clipped_ratio": 0.0,
679
+ "completions/max_length": 374.0,
680
+ "completions/max_terminated_length": 374.0,
681
+ "completions/mean_length": 285.8375,
682
+ "completions/mean_terminated_length": 285.8375,
683
+ "completions/min_length": 218.6,
684
+ "completions/min_terminated_length": 218.6,
685
+ "entropy": 0.289502215385437,
686
+ "epoch": 0.016,
687
+ "frac_reward_zero_std": 0.6,
688
+ "grad_norm": 0.004398128483444452,
689
+ "kl": 0.6524220421910286,
690
+ "learning_rate": 7.4628184937240836e-06,
691
+ "loss": -0.000950614083558321,
692
+ "num_tokens": 2685325.0,
693
+ "reward": 0.9823125600814819,
694
+ "reward_std": 0.22282702326774598,
695
+ "rewards/env_goofspiel_reward/mean": 0.9823125600814819,
696
+ "rewards/env_goofspiel_reward/std": 0.37007221579551697,
697
+ "sampling/importance_sampling_ratio/max": 1.9957563161849976,
698
+ "sampling/importance_sampling_ratio/mean": 0.5997037708759307,
699
+ "sampling/importance_sampling_ratio/min": 0.0030981259667896667,
700
+ "sampling/sampling_logp_difference/max": 5.579097270965576,
701
+ "sampling/sampling_logp_difference/mean": 0.4109388738870621,
702
+ "step": 100,
703
+ "step_time": 4.055971824800144
704
+ },
705
+ {
706
+ "clip_ratio/high_max": 0.022361111268401145,
707
+ "clip_ratio/high_mean": 0.006979166716337204,
708
+ "clip_ratio/low_mean": 0.005763888917863369,
709
+ "clip_ratio/low_min": 0.0,
710
+ "clip_ratio/region_mean": 0.012743055541068315,
711
+ "completions/clipped_ratio": 0.0,
712
+ "completions/max_length": 374.4,
713
+ "completions/max_terminated_length": 374.4,
714
+ "completions/mean_length": 288.5625,
715
+ "completions/mean_terminated_length": 288.5625,
716
+ "completions/min_length": 212.0,
717
+ "completions/min_terminated_length": 212.0,
718
+ "entropy": 0.3037287026643753,
719
+ "epoch": 0.0168,
720
+ "frac_reward_zero_std": 0.6375,
721
+ "grad_norm": 0.017863936722278595,
722
+ "kl": 0.7445758618414402,
723
+ "learning_rate": 7.4627922729055425e-06,
724
+ "loss": -0.0005706328898668289,
725
+ "num_tokens": 2819431.0,
726
+ "reward": 1.046250009536743,
727
+ "reward_std": 0.18561552762985228,
728
+ "rewards/env_goofspiel_reward/mean": 1.046250009536743,
729
+ "rewards/env_goofspiel_reward/std": 0.31610003411769866,
730
+ "sampling/importance_sampling_ratio/max": 1.9027514457702637,
731
+ "sampling/importance_sampling_ratio/mean": 0.6652593612670898,
732
+ "sampling/importance_sampling_ratio/min": 0.0008113190764561295,
733
+ "sampling/sampling_logp_difference/max": 6.060176849365234,
734
+ "sampling/sampling_logp_difference/mean": 0.3919778883457184,
735
+ "step": 105,
736
+ "step_time": 4.1537415953993335
737
+ },
738
+ {
739
+ "clip_ratio/high_max": 0.02986111119389534,
740
+ "clip_ratio/high_mean": 0.007465277798473835,
741
+ "clip_ratio/low_mean": 0.010277777817100287,
742
+ "clip_ratio/low_min": 0.0,
743
+ "clip_ratio/region_mean": 0.01774305561557412,
744
+ "completions/clipped_ratio": 0.0,
745
+ "completions/max_length": 374.8,
746
+ "completions/max_terminated_length": 374.8,
747
+ "completions/mean_length": 287.5875,
748
+ "completions/mean_terminated_length": 287.5875,
749
+ "completions/min_length": 207.0,
750
+ "completions/min_terminated_length": 207.0,
751
+ "entropy": 0.31742958873510363,
752
+ "epoch": 0.0176,
753
+ "frac_reward_zero_std": 0.5375,
754
+ "grad_norm": 0.027483796700835228,
755
+ "kl": 0.7258106715977192,
756
+ "learning_rate": 7.462764080688243e-06,
757
+ "loss": -0.0009170899167656899,
758
+ "num_tokens": 2953131.0,
759
+ "reward": 0.982312548160553,
760
+ "reward_std": 0.24404022991657257,
761
+ "rewards/env_goofspiel_reward/mean": 0.982312548160553,
762
+ "rewards/env_goofspiel_reward/std": 0.36874093413352965,
763
+ "sampling/importance_sampling_ratio/max": 1.8940089225769043,
764
+ "sampling/importance_sampling_ratio/mean": 0.6627432525157928,
765
+ "sampling/importance_sampling_ratio/min": 0.0007015861105173826,
766
+ "sampling/sampling_logp_difference/max": 6.205275201797486,
767
+ "sampling/sampling_logp_difference/mean": 0.37100034952163696,
768
+ "step": 110,
769
+ "step_time": 4.001762911599871
770
+ },
771
+ {
772
+ "clip_ratio/high_max": 0.023055555671453475,
773
+ "clip_ratio/high_mean": 0.005763888917863369,
774
+ "clip_ratio/low_mean": 0.0029513888992369177,
775
+ "clip_ratio/low_min": 0.0,
776
+ "clip_ratio/region_mean": 0.008715277723968028,
777
+ "completions/clipped_ratio": 0.0,
778
+ "completions/max_length": 374.2,
779
+ "completions/max_terminated_length": 374.2,
780
+ "completions/mean_length": 293.1875,
781
+ "completions/mean_terminated_length": 293.1875,
782
+ "completions/min_length": 212.0,
783
+ "completions/min_terminated_length": 212.0,
784
+ "entropy": 0.30907001420855523,
785
+ "epoch": 0.0184,
786
+ "frac_reward_zero_std": 0.5625,
787
+ "grad_norm": 0.015677573159337044,
788
+ "kl": 0.9665988653898239,
789
+ "learning_rate": 7.4627339170920494e-06,
790
+ "loss": -0.0009083808399736881,
791
+ "num_tokens": 3088531.0,
792
+ "reward": 0.9374375104904175,
793
+ "reward_std": 0.2228270262479782,
794
+ "rewards/env_goofspiel_reward/mean": 0.9374375104904175,
795
+ "rewards/env_goofspiel_reward/std": 0.4014770984649658,
796
+ "sampling/importance_sampling_ratio/max": 1.731867289543152,
797
+ "sampling/importance_sampling_ratio/mean": 0.6140377283096313,
798
+ "sampling/importance_sampling_ratio/min": 0.00030801825923845174,
799
+ "sampling/sampling_logp_difference/max": 5.473852968215942,
800
+ "sampling/sampling_logp_difference/mean": 0.3702615320682526,
801
+ "step": 115,
802
+ "step_time": 4.100767185799486
803
+ },
804
+ {
805
+ "clip_ratio/high_max": 0.0,
806
+ "clip_ratio/high_mean": 0.0,
807
+ "clip_ratio/low_mean": 0.004340277798473835,
808
+ "clip_ratio/low_min": 0.0,
809
+ "clip_ratio/region_mean": 0.004340277798473835,
810
+ "completions/clipped_ratio": 0.0,
811
+ "completions/max_length": 374.0,
812
+ "completions/max_terminated_length": 374.0,
813
+ "completions/mean_length": 285.58125,
814
+ "completions/mean_terminated_length": 285.58125,
815
+ "completions/min_length": 212.0,
816
+ "completions/min_terminated_length": 212.0,
817
+ "entropy": 0.2561140716075897,
818
+ "epoch": 0.0192,
819
+ "frac_reward_zero_std": 0.725,
820
+ "grad_norm": 0.0040421271696686745,
821
+ "kl": 0.5731004536151886,
822
+ "learning_rate": 7.462701782138208e-06,
823
+ "loss": -0.0007791116368025541,
824
+ "num_tokens": 3221502.0,
825
+ "reward": 1.0274375200271606,
826
+ "reward_std": 0.1485808130353689,
827
+ "rewards/env_goofspiel_reward/mean": 1.0274375200271606,
828
+ "rewards/env_goofspiel_reward/std": 0.3382692337036133,
829
+ "sampling/importance_sampling_ratio/max": 1.8715962648391724,
830
+ "sampling/importance_sampling_ratio/mean": 0.7390327334403992,
831
+ "sampling/importance_sampling_ratio/min": 0.0010936856037005783,
832
+ "sampling/sampling_logp_difference/max": 5.8203360080719,
833
+ "sampling/sampling_logp_difference/mean": 0.29723324775695803,
834
+ "step": 120,
835
+ "step_time": 4.116316759600522
836
+ },
837
+ {
838
+ "clip_ratio/high_max": 0.020656565949320794,
839
+ "clip_ratio/high_mean": 0.0051641414873301985,
840
+ "clip_ratio/low_mean": 0.009687500074505806,
841
+ "clip_ratio/low_min": 0.0,
842
+ "clip_ratio/region_mean": 0.01485164137557149,
843
+ "completions/clipped_ratio": 0.0,
844
+ "completions/max_length": 374.2,
845
+ "completions/max_terminated_length": 374.2,
846
+ "completions/mean_length": 292.74375,
847
+ "completions/mean_terminated_length": 292.74375,
848
+ "completions/min_length": 199.6,
849
+ "completions/min_terminated_length": 199.6,
850
+ "entropy": 0.2419313907623291,
851
+ "epoch": 0.02,
852
+ "frac_reward_zero_std": 0.725,
853
+ "grad_norm": 0.007091797888278961,
854
+ "kl": 0.6101688414812088,
855
+ "learning_rate": 7.462667675849357e-06,
856
+ "loss": -0.0007256286218762398,
857
+ "num_tokens": 3355893.0,
858
+ "reward": 1.0649374961853026,
859
+ "reward_std": 0.13797421008348465,
860
+ "rewards/env_goofspiel_reward/mean": 1.0649374961853026,
861
+ "rewards/env_goofspiel_reward/std": 0.29806646406650544,
862
+ "sampling/importance_sampling_ratio/max": 1.4559056520462037,
863
+ "sampling/importance_sampling_ratio/mean": 0.7336877107620239,
864
+ "sampling/importance_sampling_ratio/min": 0.001045782444998622,
865
+ "sampling/sampling_logp_difference/max": 5.669601249694824,
866
+ "sampling/sampling_logp_difference/mean": 0.32060971260070803,
867
+ "step": 125,
868
+ "step_time": 4.16715560520006
869
+ },
870
+ {
871
+ "clip_ratio/high_max": 0.0,
872
+ "clip_ratio/high_mean": 0.0,
873
+ "clip_ratio/low_mean": 0.007297979947179556,
874
+ "clip_ratio/low_min": 0.0,
875
+ "clip_ratio/region_mean": 0.007297979947179556,
876
+ "completions/clipped_ratio": 0.0,
877
+ "completions/max_length": 374.6,
878
+ "completions/max_terminated_length": 374.6,
879
+ "completions/mean_length": 293.225,
880
+ "completions/mean_terminated_length": 293.225,
881
+ "completions/min_length": 212.0,
882
+ "completions/min_terminated_length": 212.0,
883
+ "entropy": 0.21281768046319485,
884
+ "epoch": 0.0208,
885
+ "frac_reward_zero_std": 0.725,
886
+ "grad_norm": 0.012143277563154697,
887
+ "kl": 0.9402982890605927,
888
+ "learning_rate": 7.462631598249523e-06,
889
+ "loss": -0.0005932614207267761,
890
+ "num_tokens": 3491703.0,
891
+ "reward": 1.0874375343322753,
892
+ "reward_std": 0.15918741673231124,
893
+ "rewards/env_goofspiel_reward/mean": 1.0874375343322753,
894
+ "rewards/env_goofspiel_reward/std": 0.2946491539478302,
895
+ "sampling/importance_sampling_ratio/max": 1.9441105842590332,
896
+ "sampling/importance_sampling_ratio/mean": 0.7611136674880982,
897
+ "sampling/importance_sampling_ratio/min": 0.0005575922084972262,
898
+ "sampling/sampling_logp_difference/max": 6.110091686248779,
899
+ "sampling/sampling_logp_difference/mean": 0.27749234437942505,
900
+ "step": 130,
901
+ "step_time": 4.052483657999801
902
+ },
903
+ {
904
+ "clip_ratio/high_max": 0.00625,
905
+ "clip_ratio/high_mean": 0.0015625,
906
+ "clip_ratio/low_mean": 0.01180555559694767,
907
+ "clip_ratio/low_min": 0.0,
908
+ "clip_ratio/region_mean": 0.01336805559694767,
909
+ "completions/clipped_ratio": 0.0,
910
+ "completions/max_length": 374.6,
911
+ "completions/max_terminated_length": 374.6,
912
+ "completions/mean_length": 282.3375,
913
+ "completions/mean_terminated_length": 282.3375,
914
+ "completions/min_length": 194.6,
915
+ "completions/min_terminated_length": 194.6,
916
+ "entropy": 0.21125009432435035,
917
+ "epoch": 0.0216,
918
+ "frac_reward_zero_std": 0.65,
919
+ "grad_norm": 0.017155468463897705,
920
+ "kl": 0.7436703704297543,
921
+ "learning_rate": 7.462593549364123e-06,
922
+ "loss": -0.000642262538895011,
923
+ "num_tokens": 3624164.0,
924
+ "reward": 1.0160000562667846,
925
+ "reward_std": 0.1965756893157959,
926
+ "rewards/env_goofspiel_reward/mean": 1.0160000562667846,
927
+ "rewards/env_goofspiel_reward/std": 0.35902883410453795,
928
+ "sampling/importance_sampling_ratio/max": 1.8083943367004394,
929
+ "sampling/importance_sampling_ratio/mean": 0.6931412100791932,
930
+ "sampling/importance_sampling_ratio/min": 0.001236545197753003,
931
+ "sampling/sampling_logp_difference/max": 6.242915344238281,
932
+ "sampling/sampling_logp_difference/mean": 0.30908069014549255,
933
+ "step": 135,
934
+ "step_time": 4.004675274600231
935
+ },
936
+ {
937
+ "clip_ratio/high_max": 0.0,
938
+ "clip_ratio/high_mean": 0.0,
939
+ "clip_ratio/low_mean": 0.010798611212521791,
940
+ "clip_ratio/low_min": 0.0,
941
+ "clip_ratio/region_mean": 0.010798611212521791,
942
+ "completions/clipped_ratio": 0.0,
943
+ "completions/max_length": 374.4,
944
+ "completions/max_terminated_length": 374.4,
945
+ "completions/mean_length": 300.39375,
946
+ "completions/mean_terminated_length": 300.39375,
947
+ "completions/min_length": 212.0,
948
+ "completions/min_terminated_length": 212.0,
949
+ "entropy": 0.19219726845622062,
950
+ "epoch": 0.0224,
951
+ "frac_reward_zero_std": 0.6,
952
+ "grad_norm": 0.010142410174012184,
953
+ "kl": 1.7957286298274995,
954
+ "learning_rate": 7.462553529219961e-06,
955
+ "loss": -0.0008514182642102242,
956
+ "num_tokens": 3762831.0,
957
+ "reward": 1.0236875414848328,
958
+ "reward_std": 0.2069171190261841,
959
+ "rewards/env_goofspiel_reward/mean": 1.0236875414848328,
960
+ "rewards/env_goofspiel_reward/std": 0.3530817925930023,
961
+ "sampling/importance_sampling_ratio/max": 1.7995745182037353,
962
+ "sampling/importance_sampling_ratio/mean": 0.7829151630401612,
963
+ "sampling/importance_sampling_ratio/min": 0.005923272194922902,
964
+ "sampling/sampling_logp_difference/max": 5.419459342956543,
965
+ "sampling/sampling_logp_difference/mean": 0.23471600711345672,
966
+ "step": 140,
967
+ "step_time": 4.171908260800047
968
+ },
969
+ {
970
+ "clip_ratio/high_max": 0.01041666679084301,
971
+ "clip_ratio/high_mean": 0.0026041666977107527,
972
+ "clip_ratio/low_mean": 0.005590277723968029,
973
+ "clip_ratio/low_min": 0.0,
974
+ "clip_ratio/region_mean": 0.00819444442167878,
975
+ "completions/clipped_ratio": 0.0,
976
+ "completions/max_length": 374.6,
977
+ "completions/max_terminated_length": 374.6,
978
+ "completions/mean_length": 289.4625,
979
+ "completions/mean_terminated_length": 289.4625,
980
+ "completions/min_length": 212.0,
981
+ "completions/min_terminated_length": 212.0,
982
+ "entropy": 0.22108654975891112,
983
+ "epoch": 0.0232,
984
+ "frac_reward_zero_std": 0.7375,
985
+ "grad_norm": 0.04653310030698776,
986
+ "kl": 0.5376328125596046,
987
+ "learning_rate": 7.462511537845228e-06,
988
+ "loss": -0.00045080664567649366,
989
+ "num_tokens": 3896833.0,
990
+ "reward": 1.0724375009536744,
991
+ "reward_std": 0.13797421008348465,
992
+ "rewards/env_goofspiel_reward/mean": 1.0724375009536744,
993
+ "rewards/env_goofspiel_reward/std": 0.30171733498573305,
994
+ "sampling/importance_sampling_ratio/max": 1.8748072385787964,
995
+ "sampling/importance_sampling_ratio/mean": 0.7784299254417419,
996
+ "sampling/importance_sampling_ratio/min": 0.0016431780066341161,
997
+ "sampling/sampling_logp_difference/max": 6.234890079498291,
998
+ "sampling/sampling_logp_difference/mean": 0.2888496518135071,
999
+ "step": 145,
1000
+ "step_time": 4.1071294096005655
1001
+ },
1002
+ {
1003
+ "clip_ratio/high_max": 0.00555555559694767,
1004
+ "clip_ratio/high_mean": 0.0013888888992369176,
1005
+ "clip_ratio/low_mean": 0.0015625,
1006
+ "clip_ratio/low_min": 0.0,
1007
+ "clip_ratio/region_mean": 0.0029513888992369177,
1008
+ "completions/clipped_ratio": 0.0,
1009
+ "completions/max_length": 374.0,
1010
+ "completions/max_terminated_length": 374.0,
1011
+ "completions/mean_length": 283.29375,
1012
+ "completions/mean_terminated_length": 283.29375,
1013
+ "completions/min_length": 212.0,
1014
+ "completions/min_terminated_length": 212.0,
1015
+ "entropy": 0.2212754048407078,
1016
+ "epoch": 0.024,
1017
+ "frac_reward_zero_std": 0.7125,
1018
+ "grad_norm": 0.026271872222423553,
1019
+ "kl": 0.6011221908032894,
1020
+ "learning_rate": 7.4624675752695055e-06,
1021
+ "loss": -0.0005275519099086523,
1022
+ "num_tokens": 4029909.0,
1023
+ "reward": 1.0724999904632568,
1024
+ "reward_std": 0.14849241971969604,
1025
+ "rewards/env_goofspiel_reward/mean": 1.0724999904632568,
1026
+ "rewards/env_goofspiel_reward/std": 0.29245385825634,
1027
+ "sampling/importance_sampling_ratio/max": 1.610938024520874,
1028
+ "sampling/importance_sampling_ratio/mean": 0.7898086547851563,
1029
+ "sampling/importance_sampling_ratio/min": 0.0010933216894045473,
1030
+ "sampling/sampling_logp_difference/max": 4.982583332061767,
1031
+ "sampling/sampling_logp_difference/mean": 0.22510133385658265,
1032
+ "step": 150,
1033
+ "step_time": 4.046049839799889
1034
+ },
1035
+ {
1036
+ "epoch": 0.024,
1037
+ "eval_clip_ratio/high_max": 0.0,
1038
+ "eval_clip_ratio/high_mean": 0.0,
1039
+ "eval_clip_ratio/low_mean": 0.0,
1040
+ "eval_clip_ratio/low_min": 0.0,
1041
+ "eval_clip_ratio/region_mean": 0.0,
1042
+ "eval_completions/clipped_ratio": 0.0,
1043
+ "eval_completions/max_length": 373.3333333333333,
1044
+ "eval_completions/max_terminated_length": 373.3333333333333,
1045
+ "eval_completions/mean_length": 314.2083333333333,
1046
+ "eval_completions/mean_terminated_length": 314.2083333333333,
1047
+ "eval_completions/min_length": 263.3333333333333,
1048
+ "eval_completions/min_terminated_length": 263.3333333333333,
1049
+ "eval_entropy": 0.2316575994094213,
1050
+ "eval_frac_reward_zero_std": 0.9166666666666666,
1051
+ "eval_kl": 0.5032360255718231,
1052
+ "eval_loss": -0.00011960561823798344,
1053
+ "eval_num_tokens": 4029909.0,
1054
+ "eval_reward": 1.1500000556310017,
1055
+ "eval_reward_std": 0.07071067889531453,
1056
+ "eval_rewards/env_goofspiel_reward/mean": 1.1500000556310017,
1057
+ "eval_rewards/env_goofspiel_reward/std": 0.14142136772473654,
1058
+ "eval_runtime": 2.0552,
1059
+ "eval_samples_per_second": 4.866,
1060
+ "eval_sampling/importance_sampling_ratio/max": 1.122262716293335,
1061
+ "eval_sampling/importance_sampling_ratio/mean": 0.7364152868588766,
1062
+ "eval_sampling/importance_sampling_ratio/min": 0.09143149045606454,
1063
+ "eval_sampling/sampling_logp_difference/max": 3.0512802600860596,
1064
+ "eval_sampling/sampling_logp_difference/mean": 0.20749726643164954,
1065
+ "eval_steps_per_second": 0.973,
1066
+ "step": 150
1067
+ }
1068
+ ],
1069
+ "logging_steps": 5,
1070
+ "max_steps": 18750,
1071
+ "num_input_tokens_seen": 4029909,
1072
+ "num_train_epochs": 3,
1073
+ "save_steps": 500,
1074
+ "stateful_callbacks": {
1075
+ "TrainerControl": {
1076
+ "args": {
1077
+ "should_epoch_stop": false,
1078
+ "should_evaluate": false,
1079
+ "should_log": false,
1080
+ "should_save": true,
1081
+ "should_training_stop": false
1082
+ },
1083
+ "attributes": {}
1084
+ }
1085
+ },
1086
+ "total_flos": 0.0,
1087
+ "train_batch_size": 2,
1088
+ "trial_name": null,
1089
+ "trial_params": null
1090
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fde2ca074bfecc8005685f33b856d8bbc485defe06d87884a3fba35c79bf0349
3
+ size 7185