skyyyyks commited on
Commit
3512cb4
·
verified ·
1 Parent(s): e5bb342

Upload folder using huggingface_hub

Browse files
multievent_forecaster_polymarket_4b_1400/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-4B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
multievent_forecaster_polymarket_4b_1400/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/mnt/tidal-alsh-share2/usr/wangshanyong/models/Qwen/Qwen3-4B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 32,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "q_proj",
34
+ "o_proj",
35
+ "k_proj"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
multievent_forecaster_polymarket_4b_1400/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8ab26a0841ce536f808de34cb897f8f8b95bdeb19817e70bcbcb88d9f7e6ff1
3
+ size 94410616
multievent_forecaster_polymarket_4b_1400/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "/mnt/tidal-alsh-share2/usr/wangshanyong/models/Qwen/Qwen3-4B",
3
+ "max_length": 1024,
4
+ "model_type": "llm_regressor",
5
+ "transformers_version": "4.57.6"
6
+ }
multievent_forecaster_polymarket_4b_1400/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2a63457c0d46e29093605788c7807975d1b288d4010b61a55c9c118dc804087
3
+ size 194415371
multievent_forecaster_polymarket_4b_1400/regression_head.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c1beffe2992f80d143ce8d00d9a65f16db46451e1720c47cab48beccc8fe6d6
3
+ size 2712765
multievent_forecaster_polymarket_4b_1400/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3602c4484aa2e31b75ae44f81cb14e20a1e5a12ba4cecce773b914788c5443fd
3
+ size 14645
multievent_forecaster_polymarket_4b_1400/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47421e914d049f5c91268cd9fd97a415163eb53e5723f091410c6571bae03b4b
3
+ size 1465
multievent_forecaster_polymarket_4b_1400/trainer_state.json ADDED
@@ -0,0 +1,1034 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1000,
3
+ "best_metric": 0.1582760363817215,
4
+ "best_model_checkpoint": "../saves/multievent_forecaster_polymarket_4b/checkpoint-1000",
5
+ "epoch": 0.5712536978475977,
6
+ "eval_steps": 500,
7
+ "global_step": 1400,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.004080383556054269,
14
+ "grad_norm": 2.0757477283477783,
15
+ "learning_rate": 4.9981640146878825e-05,
16
+ "loss": 2.0378,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.008160767112108539,
21
+ "grad_norm": 26.03803253173828,
22
+ "learning_rate": 4.996124031007752e-05,
23
+ "loss": 0.3803,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.012241150668162807,
28
+ "grad_norm": 134.36888122558594,
29
+ "learning_rate": 4.994084047327622e-05,
30
+ "loss": 1.291,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.016321534224217078,
35
+ "grad_norm": 71.54600524902344,
36
+ "learning_rate": 4.9920440636474916e-05,
37
+ "loss": 1.182,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.020401917780271346,
42
+ "grad_norm": 72.89893341064453,
43
+ "learning_rate": 4.99000407996736e-05,
44
+ "loss": 1.6147,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.024482301336325615,
49
+ "grad_norm": 44.821685791015625,
50
+ "learning_rate": 4.98796409628723e-05,
51
+ "loss": 0.7805,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.028562684892379883,
56
+ "grad_norm": 7.488345146179199,
57
+ "learning_rate": 4.985924112607099e-05,
58
+ "loss": 0.8426,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.032643068448434155,
63
+ "grad_norm": 150.4651641845703,
64
+ "learning_rate": 4.9838841289269687e-05,
65
+ "loss": 1.7852,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.03672345200448842,
70
+ "grad_norm": 14.307666778564453,
71
+ "learning_rate": 4.981844145246838e-05,
72
+ "loss": 2.4076,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.04080383556054269,
77
+ "grad_norm": 3.1318511962890625,
78
+ "learning_rate": 4.979804161566708e-05,
79
+ "loss": 0.8476,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.04488421911659696,
84
+ "grad_norm": 44.676937103271484,
85
+ "learning_rate": 4.977764177886577e-05,
86
+ "loss": 1.7672,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.04896460267265123,
91
+ "grad_norm": 35.75183868408203,
92
+ "learning_rate": 4.9757241942064464e-05,
93
+ "loss": 1.8243,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.0530449862287055,
98
+ "grad_norm": 70.882080078125,
99
+ "learning_rate": 4.973684210526316e-05,
100
+ "loss": 1.5172,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.057125369784759766,
105
+ "grad_norm": 64.43199157714844,
106
+ "learning_rate": 4.971644226846185e-05,
107
+ "loss": 2.2016,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.06120575334081404,
112
+ "grad_norm": 13.279516220092773,
113
+ "learning_rate": 4.969604243166055e-05,
114
+ "loss": 1.2317,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.06528613689686831,
119
+ "grad_norm": 138.01319885253906,
120
+ "learning_rate": 4.967564259485924e-05,
121
+ "loss": 1.4188,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.06936652045292258,
126
+ "grad_norm": 93.34532165527344,
127
+ "learning_rate": 4.9655242758057944e-05,
128
+ "loss": 0.8121,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.07344690400897684,
133
+ "grad_norm": 93.5193862915039,
134
+ "learning_rate": 4.963484292125663e-05,
135
+ "loss": 2.0329,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.07752728756503112,
140
+ "grad_norm": 63.74538803100586,
141
+ "learning_rate": 4.9614443084455326e-05,
142
+ "loss": 0.4579,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.08160767112108538,
147
+ "grad_norm": 84.2245864868164,
148
+ "learning_rate": 4.959404324765402e-05,
149
+ "loss": 0.5814,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.08568805467713965,
154
+ "grad_norm": 213.67283630371094,
155
+ "learning_rate": 4.9573643410852715e-05,
156
+ "loss": 3.0867,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.08976843823319391,
161
+ "grad_norm": 41.77745056152344,
162
+ "learning_rate": 4.955324357405141e-05,
163
+ "loss": 0.4549,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.0938488217892482,
168
+ "grad_norm": 4.588130474090576,
169
+ "learning_rate": 4.9532843737250103e-05,
170
+ "loss": 1.9535,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.09792920534530246,
175
+ "grad_norm": 1.0185309648513794,
176
+ "learning_rate": 4.95124439004488e-05,
177
+ "loss": 1.54,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.10200958890135672,
182
+ "grad_norm": 0.5728484988212585,
183
+ "learning_rate": 4.949204406364749e-05,
184
+ "loss": 1.3238,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.106089972457411,
189
+ "grad_norm": 42.32962417602539,
190
+ "learning_rate": 4.947164422684619e-05,
191
+ "loss": 0.5126,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.11017035601346527,
196
+ "grad_norm": 8.426935195922852,
197
+ "learning_rate": 4.945124439004488e-05,
198
+ "loss": 0.5544,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.11425073956951953,
203
+ "grad_norm": 87.63069915771484,
204
+ "learning_rate": 4.9430844553243576e-05,
205
+ "loss": 1.9061,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.1183311231255738,
210
+ "grad_norm": 51.695743560791016,
211
+ "learning_rate": 4.941044471644227e-05,
212
+ "loss": 1.5502,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.12241150668162808,
217
+ "grad_norm": 11.145570755004883,
218
+ "learning_rate": 4.9390044879640965e-05,
219
+ "loss": 1.6132,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.12649189023768234,
224
+ "grad_norm": 45.673824310302734,
225
+ "learning_rate": 4.936964504283966e-05,
226
+ "loss": 0.3869,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.13057227379373662,
231
+ "grad_norm": 10.354819297790527,
232
+ "learning_rate": 4.9349245206038354e-05,
233
+ "loss": 0.8387,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.13465265734979087,
238
+ "grad_norm": 6.868192672729492,
239
+ "learning_rate": 4.932884536923705e-05,
240
+ "loss": 0.8009,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.13873304090584515,
245
+ "grad_norm": 8.307397842407227,
246
+ "learning_rate": 4.930844553243574e-05,
247
+ "loss": 0.9572,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.14281342446189943,
252
+ "grad_norm": 6.341692924499512,
253
+ "learning_rate": 4.928804569563444e-05,
254
+ "loss": 1.5027,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.14689380801795368,
259
+ "grad_norm": 22.819496154785156,
260
+ "learning_rate": 4.926764585883313e-05,
261
+ "loss": 1.0288,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.15097419157400796,
266
+ "grad_norm": 0.9149819612503052,
267
+ "learning_rate": 4.9247246022031826e-05,
268
+ "loss": 2.5555,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.15505457513006224,
273
+ "grad_norm": 12.473997116088867,
274
+ "learning_rate": 4.922684618523052e-05,
275
+ "loss": 0.4131,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.1591349586861165,
280
+ "grad_norm": 0.9700067639350891,
281
+ "learning_rate": 4.9206446348429215e-05,
282
+ "loss": 3.3748,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.16321534224217077,
287
+ "grad_norm": 24.929426193237305,
288
+ "learning_rate": 4.918604651162791e-05,
289
+ "loss": 1.1368,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.16729572579822502,
294
+ "grad_norm": 61.25931167602539,
295
+ "learning_rate": 4.9165646674826604e-05,
296
+ "loss": 0.8989,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.1713761093542793,
301
+ "grad_norm": 57.02674102783203,
302
+ "learning_rate": 4.91452468380253e-05,
303
+ "loss": 1.6214,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.17545649291033358,
308
+ "grad_norm": 2.28898024559021,
309
+ "learning_rate": 4.912484700122399e-05,
310
+ "loss": 1.7401,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.17953687646638783,
315
+ "grad_norm": 33.844932556152344,
316
+ "learning_rate": 4.910444716442269e-05,
317
+ "loss": 2.4001,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.1836172600224421,
322
+ "grad_norm": 65.07670593261719,
323
+ "learning_rate": 4.908404732762138e-05,
324
+ "loss": 1.331,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.1876976435784964,
329
+ "grad_norm": 0.6607738733291626,
330
+ "learning_rate": 4.9063647490820076e-05,
331
+ "loss": 1.2078,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.19177802713455064,
336
+ "grad_norm": 155.83596801757812,
337
+ "learning_rate": 4.904324765401877e-05,
338
+ "loss": 1.7682,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.19585841069060492,
343
+ "grad_norm": 2.5435492992401123,
344
+ "learning_rate": 4.9022847817217465e-05,
345
+ "loss": 0.8433,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.1999387942466592,
350
+ "grad_norm": 102.03804779052734,
351
+ "learning_rate": 4.900244798041616e-05,
352
+ "loss": 1.2029,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.20401917780271345,
357
+ "grad_norm": 71.43460845947266,
358
+ "learning_rate": 4.8982048143614854e-05,
359
+ "loss": 1.5539,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.20401917780271345,
364
+ "eval_delta_mse": 0.008629978634417057,
365
+ "eval_loss": 0.16608330607414246,
366
+ "eval_price_mse": 0.008629978634417057,
367
+ "eval_runtime": 20397.2294,
368
+ "eval_samples_per_second": 0.181,
369
+ "eval_steps_per_second": 0.091,
370
+ "step": 500
371
+ },
372
+ {
373
+ "epoch": 0.20809956135876773,
374
+ "grad_norm": 1.581972360610962,
375
+ "learning_rate": 4.896164830681355e-05,
376
+ "loss": 4.0735,
377
+ "step": 510
378
+ },
379
+ {
380
+ "epoch": 0.212179944914822,
381
+ "grad_norm": 111.77674102783203,
382
+ "learning_rate": 4.894124847001224e-05,
383
+ "loss": 1.3277,
384
+ "step": 520
385
+ },
386
+ {
387
+ "epoch": 0.21626032847087626,
388
+ "grad_norm": 37.864768981933594,
389
+ "learning_rate": 4.892084863321094e-05,
390
+ "loss": 2.5044,
391
+ "step": 530
392
+ },
393
+ {
394
+ "epoch": 0.22034071202693054,
395
+ "grad_norm": 101.5436782836914,
396
+ "learning_rate": 4.890044879640963e-05,
397
+ "loss": 1.0035,
398
+ "step": 540
399
+ },
400
+ {
401
+ "epoch": 0.22442109558298481,
402
+ "grad_norm": 18.511301040649414,
403
+ "learning_rate": 4.8880048959608326e-05,
404
+ "loss": 0.4329,
405
+ "step": 550
406
+ },
407
+ {
408
+ "epoch": 0.22850147913903907,
409
+ "grad_norm": 32.60016632080078,
410
+ "learning_rate": 4.885964912280702e-05,
411
+ "loss": 2.0507,
412
+ "step": 560
413
+ },
414
+ {
415
+ "epoch": 0.23258186269509334,
416
+ "grad_norm": 59.13274002075195,
417
+ "learning_rate": 4.883924928600571e-05,
418
+ "loss": 1.2127,
419
+ "step": 570
420
+ },
421
+ {
422
+ "epoch": 0.2366622462511476,
423
+ "grad_norm": 108.24981689453125,
424
+ "learning_rate": 4.881884944920441e-05,
425
+ "loss": 1.374,
426
+ "step": 580
427
+ },
428
+ {
429
+ "epoch": 0.24074262980720187,
430
+ "grad_norm": 34.11159133911133,
431
+ "learning_rate": 4.8798449612403104e-05,
432
+ "loss": 0.92,
433
+ "step": 590
434
+ },
435
+ {
436
+ "epoch": 0.24482301336325615,
437
+ "grad_norm": 50.873722076416016,
438
+ "learning_rate": 4.87780497756018e-05,
439
+ "loss": 0.5168,
440
+ "step": 600
441
+ },
442
+ {
443
+ "epoch": 0.2489033969193104,
444
+ "grad_norm": 1.7769521474838257,
445
+ "learning_rate": 4.875764993880049e-05,
446
+ "loss": 0.7541,
447
+ "step": 610
448
+ },
449
+ {
450
+ "epoch": 0.2529837804753647,
451
+ "grad_norm": 48.230323791503906,
452
+ "learning_rate": 4.873725010199919e-05,
453
+ "loss": 0.6089,
454
+ "step": 620
455
+ },
456
+ {
457
+ "epoch": 0.25706416403141896,
458
+ "grad_norm": 57.97608184814453,
459
+ "learning_rate": 4.871685026519788e-05,
460
+ "loss": 1.351,
461
+ "step": 630
462
+ },
463
+ {
464
+ "epoch": 0.26114454758747324,
465
+ "grad_norm": 74.06781768798828,
466
+ "learning_rate": 4.869645042839657e-05,
467
+ "loss": 1.3261,
468
+ "step": 640
469
+ },
470
+ {
471
+ "epoch": 0.26522493114352746,
472
+ "grad_norm": 4.713451385498047,
473
+ "learning_rate": 4.867605059159527e-05,
474
+ "loss": 2.044,
475
+ "step": 650
476
+ },
477
+ {
478
+ "epoch": 0.26930531469958174,
479
+ "grad_norm": 81.6591567993164,
480
+ "learning_rate": 4.8655650754793965e-05,
481
+ "loss": 3.0752,
482
+ "step": 660
483
+ },
484
+ {
485
+ "epoch": 0.273385698255636,
486
+ "grad_norm": 57.05862808227539,
487
+ "learning_rate": 4.863525091799266e-05,
488
+ "loss": 1.6468,
489
+ "step": 670
490
+ },
491
+ {
492
+ "epoch": 0.2774660818116903,
493
+ "grad_norm": 30.273658752441406,
494
+ "learning_rate": 4.8614851081191354e-05,
495
+ "loss": 0.7242,
496
+ "step": 680
497
+ },
498
+ {
499
+ "epoch": 0.2815464653677446,
500
+ "grad_norm": 26.366323471069336,
501
+ "learning_rate": 4.859445124439005e-05,
502
+ "loss": 2.0522,
503
+ "step": 690
504
+ },
505
+ {
506
+ "epoch": 0.28562684892379886,
507
+ "grad_norm": 4.619323253631592,
508
+ "learning_rate": 4.857405140758874e-05,
509
+ "loss": 0.7087,
510
+ "step": 700
511
+ },
512
+ {
513
+ "epoch": 0.2897072324798531,
514
+ "grad_norm": 26.68976593017578,
515
+ "learning_rate": 4.855365157078743e-05,
516
+ "loss": 1.4464,
517
+ "step": 710
518
+ },
519
+ {
520
+ "epoch": 0.29378761603590736,
521
+ "grad_norm": 8.125544548034668,
522
+ "learning_rate": 4.853325173398613e-05,
523
+ "loss": 1.8895,
524
+ "step": 720
525
+ },
526
+ {
527
+ "epoch": 0.29786799959196164,
528
+ "grad_norm": 15.307260513305664,
529
+ "learning_rate": 4.8512851897184826e-05,
530
+ "loss": 2.119,
531
+ "step": 730
532
+ },
533
+ {
534
+ "epoch": 0.3019483831480159,
535
+ "grad_norm": 63.065834045410156,
536
+ "learning_rate": 4.849245206038352e-05,
537
+ "loss": 0.6371,
538
+ "step": 740
539
+ },
540
+ {
541
+ "epoch": 0.3060287667040702,
542
+ "grad_norm": 37.66510772705078,
543
+ "learning_rate": 4.8472052223582215e-05,
544
+ "loss": 0.7812,
545
+ "step": 750
546
+ },
547
+ {
548
+ "epoch": 0.3101091502601245,
549
+ "grad_norm": 178.31991577148438,
550
+ "learning_rate": 4.845165238678091e-05,
551
+ "loss": 2.1031,
552
+ "step": 760
553
+ },
554
+ {
555
+ "epoch": 0.3141895338161787,
556
+ "grad_norm": 4.701211452484131,
557
+ "learning_rate": 4.84312525499796e-05,
558
+ "loss": 1.481,
559
+ "step": 770
560
+ },
561
+ {
562
+ "epoch": 0.318269917372233,
563
+ "grad_norm": 5.170464992523193,
564
+ "learning_rate": 4.841085271317829e-05,
565
+ "loss": 0.6031,
566
+ "step": 780
567
+ },
568
+ {
569
+ "epoch": 0.32235030092828726,
570
+ "grad_norm": 22.698759078979492,
571
+ "learning_rate": 4.839045287637699e-05,
572
+ "loss": 0.2871,
573
+ "step": 790
574
+ },
575
+ {
576
+ "epoch": 0.32643068448434154,
577
+ "grad_norm": 4.7304463386535645,
578
+ "learning_rate": 4.837005303957569e-05,
579
+ "loss": 1.2046,
580
+ "step": 800
581
+ },
582
+ {
583
+ "epoch": 0.3305110680403958,
584
+ "grad_norm": 0.30286353826522827,
585
+ "learning_rate": 4.834965320277438e-05,
586
+ "loss": 0.3637,
587
+ "step": 810
588
+ },
589
+ {
590
+ "epoch": 0.33459145159645004,
591
+ "grad_norm": 74.5572509765625,
592
+ "learning_rate": 4.8329253365973077e-05,
593
+ "loss": 1.1692,
594
+ "step": 820
595
+ },
596
+ {
597
+ "epoch": 0.3386718351525043,
598
+ "grad_norm": 4.788327693939209,
599
+ "learning_rate": 4.830885352917177e-05,
600
+ "loss": 0.31,
601
+ "step": 830
602
+ },
603
+ {
604
+ "epoch": 0.3427522187085586,
605
+ "grad_norm": 26.012136459350586,
606
+ "learning_rate": 4.828845369237046e-05,
607
+ "loss": 1.0463,
608
+ "step": 840
609
+ },
610
+ {
611
+ "epoch": 0.3468326022646129,
612
+ "grad_norm": 78.63890075683594,
613
+ "learning_rate": 4.826805385556915e-05,
614
+ "loss": 1.0896,
615
+ "step": 850
616
+ },
617
+ {
618
+ "epoch": 0.35091298582066716,
619
+ "grad_norm": 41.01737594604492,
620
+ "learning_rate": 4.8247654018767854e-05,
621
+ "loss": 1.0675,
622
+ "step": 860
623
+ },
624
+ {
625
+ "epoch": 0.35499336937672143,
626
+ "grad_norm": 53.6055793762207,
627
+ "learning_rate": 4.822725418196655e-05,
628
+ "loss": 3.5859,
629
+ "step": 870
630
+ },
631
+ {
632
+ "epoch": 0.35907375293277566,
633
+ "grad_norm": 17.812931060791016,
634
+ "learning_rate": 4.820685434516524e-05,
635
+ "loss": 3.3374,
636
+ "step": 880
637
+ },
638
+ {
639
+ "epoch": 0.36315413648882994,
640
+ "grad_norm": 3.9189977645874023,
641
+ "learning_rate": 4.818645450836394e-05,
642
+ "loss": 1.544,
643
+ "step": 890
644
+ },
645
+ {
646
+ "epoch": 0.3672345200448842,
647
+ "grad_norm": 9.485867500305176,
648
+ "learning_rate": 4.8166054671562625e-05,
649
+ "loss": 2.4021,
650
+ "step": 900
651
+ },
652
+ {
653
+ "epoch": 0.3713149036009385,
654
+ "grad_norm": 15.945775032043457,
655
+ "learning_rate": 4.814565483476132e-05,
656
+ "loss": 2.3368,
657
+ "step": 910
658
+ },
659
+ {
660
+ "epoch": 0.3753952871569928,
661
+ "grad_norm": 0.8831557631492615,
662
+ "learning_rate": 4.8125254997960014e-05,
663
+ "loss": 0.5882,
664
+ "step": 920
665
+ },
666
+ {
667
+ "epoch": 0.37947567071304705,
668
+ "grad_norm": 0.7511992454528809,
669
+ "learning_rate": 4.8104855161158716e-05,
670
+ "loss": 1.3834,
671
+ "step": 930
672
+ },
673
+ {
674
+ "epoch": 0.3835560542691013,
675
+ "grad_norm": 25.953502655029297,
676
+ "learning_rate": 4.808445532435741e-05,
677
+ "loss": 0.8605,
678
+ "step": 940
679
+ },
680
+ {
681
+ "epoch": 0.38763643782515556,
682
+ "grad_norm": 0.09692514687776566,
683
+ "learning_rate": 4.8064055487556105e-05,
684
+ "loss": 0.7689,
685
+ "step": 950
686
+ },
687
+ {
688
+ "epoch": 0.39171682138120983,
689
+ "grad_norm": 21.603172302246094,
690
+ "learning_rate": 4.80436556507548e-05,
691
+ "loss": 1.8927,
692
+ "step": 960
693
+ },
694
+ {
695
+ "epoch": 0.3957972049372641,
696
+ "grad_norm": 6.030911445617676,
697
+ "learning_rate": 4.802325581395349e-05,
698
+ "loss": 0.562,
699
+ "step": 970
700
+ },
701
+ {
702
+ "epoch": 0.3998775884933184,
703
+ "grad_norm": 64.26757049560547,
704
+ "learning_rate": 4.800285597715218e-05,
705
+ "loss": 0.7878,
706
+ "step": 980
707
+ },
708
+ {
709
+ "epoch": 0.4039579720493726,
710
+ "grad_norm": 26.142595291137695,
711
+ "learning_rate": 4.7982456140350876e-05,
712
+ "loss": 1.2921,
713
+ "step": 990
714
+ },
715
+ {
716
+ "epoch": 0.4080383556054269,
717
+ "grad_norm": 3.4586594104766846,
718
+ "learning_rate": 4.796205630354958e-05,
719
+ "loss": 1.7767,
720
+ "step": 1000
721
+ },
722
+ {
723
+ "epoch": 0.4080383556054269,
724
+ "eval_delta_mse": 0.00726704578846693,
725
+ "eval_loss": 0.1582760363817215,
726
+ "eval_price_mse": 0.00726704578846693,
727
+ "eval_runtime": 20405.9114,
728
+ "eval_samples_per_second": 0.181,
729
+ "eval_steps_per_second": 0.09,
730
+ "step": 1000
731
+ },
732
+ {
733
+ "epoch": 0.4121187391614812,
734
+ "grad_norm": 37.83722686767578,
735
+ "learning_rate": 4.794165646674827e-05,
736
+ "loss": 1.6681,
737
+ "step": 1010
738
+ },
739
+ {
740
+ "epoch": 0.41619912271753545,
741
+ "grad_norm": 41.78764343261719,
742
+ "learning_rate": 4.7921256629946966e-05,
743
+ "loss": 1.5695,
744
+ "step": 1020
745
+ },
746
+ {
747
+ "epoch": 0.42027950627358973,
748
+ "grad_norm": 9.436123847961426,
749
+ "learning_rate": 4.790085679314566e-05,
750
+ "loss": 0.9996,
751
+ "step": 1030
752
+ },
753
+ {
754
+ "epoch": 0.424359889829644,
755
+ "grad_norm": 1.5197569131851196,
756
+ "learning_rate": 4.788045695634435e-05,
757
+ "loss": 1.5881,
758
+ "step": 1040
759
+ },
760
+ {
761
+ "epoch": 0.42844027338569823,
762
+ "grad_norm": 42.05815124511719,
763
+ "learning_rate": 4.786005711954304e-05,
764
+ "loss": 0.9537,
765
+ "step": 1050
766
+ },
767
+ {
768
+ "epoch": 0.4325206569417525,
769
+ "grad_norm": 9.018413543701172,
770
+ "learning_rate": 4.783965728274174e-05,
771
+ "loss": 0.9691,
772
+ "step": 1060
773
+ },
774
+ {
775
+ "epoch": 0.4366010404978068,
776
+ "grad_norm": 1.6392107009887695,
777
+ "learning_rate": 4.781925744594044e-05,
778
+ "loss": 0.9397,
779
+ "step": 1070
780
+ },
781
+ {
782
+ "epoch": 0.44068142405386107,
783
+ "grad_norm": 3.5962109565734863,
784
+ "learning_rate": 4.779885760913913e-05,
785
+ "loss": 0.5019,
786
+ "step": 1080
787
+ },
788
+ {
789
+ "epoch": 0.44476180760991535,
790
+ "grad_norm": 1.014216423034668,
791
+ "learning_rate": 4.777845777233783e-05,
792
+ "loss": 0.3138,
793
+ "step": 1090
794
+ },
795
+ {
796
+ "epoch": 0.44884219116596963,
797
+ "grad_norm": 1.4875215291976929,
798
+ "learning_rate": 4.7758057935536515e-05,
799
+ "loss": 1.7817,
800
+ "step": 1100
801
+ },
802
+ {
803
+ "epoch": 0.45292257472202385,
804
+ "grad_norm": 1.9012846946716309,
805
+ "learning_rate": 4.773765809873521e-05,
806
+ "loss": 0.3366,
807
+ "step": 1110
808
+ },
809
+ {
810
+ "epoch": 0.45700295827807813,
811
+ "grad_norm": 102.30817413330078,
812
+ "learning_rate": 4.7717258261933904e-05,
813
+ "loss": 1.1379,
814
+ "step": 1120
815
+ },
816
+ {
817
+ "epoch": 0.4610833418341324,
818
+ "grad_norm": 6.154768466949463,
819
+ "learning_rate": 4.76968584251326e-05,
820
+ "loss": 1.9142,
821
+ "step": 1130
822
+ },
823
+ {
824
+ "epoch": 0.4651637253901867,
825
+ "grad_norm": 3.3108744621276855,
826
+ "learning_rate": 4.76764585883313e-05,
827
+ "loss": 3.3609,
828
+ "step": 1140
829
+ },
830
+ {
831
+ "epoch": 0.46924410894624097,
832
+ "grad_norm": 57.68730926513672,
833
+ "learning_rate": 4.7656058751529994e-05,
834
+ "loss": 2.9367,
835
+ "step": 1150
836
+ },
837
+ {
838
+ "epoch": 0.4733244925022952,
839
+ "grad_norm": 7.233608245849609,
840
+ "learning_rate": 4.763565891472869e-05,
841
+ "loss": 1.7578,
842
+ "step": 1160
843
+ },
844
+ {
845
+ "epoch": 0.47740487605834947,
846
+ "grad_norm": 21.241331100463867,
847
+ "learning_rate": 4.7615259077927376e-05,
848
+ "loss": 1.1464,
849
+ "step": 1170
850
+ },
851
+ {
852
+ "epoch": 0.48148525961440375,
853
+ "grad_norm": 1.1809065341949463,
854
+ "learning_rate": 4.759485924112607e-05,
855
+ "loss": 0.4299,
856
+ "step": 1180
857
+ },
858
+ {
859
+ "epoch": 0.48556564317045803,
860
+ "grad_norm": 9.076310157775879,
861
+ "learning_rate": 4.7574459404324765e-05,
862
+ "loss": 3.7821,
863
+ "step": 1190
864
+ },
865
+ {
866
+ "epoch": 0.4896460267265123,
867
+ "grad_norm": 1.5124175548553467,
868
+ "learning_rate": 4.755405956752346e-05,
869
+ "loss": 0.4608,
870
+ "step": 1200
871
+ },
872
+ {
873
+ "epoch": 0.4937264102825666,
874
+ "grad_norm": 3.8463289737701416,
875
+ "learning_rate": 4.753365973072216e-05,
876
+ "loss": 3.0101,
877
+ "step": 1210
878
+ },
879
+ {
880
+ "epoch": 0.4978067938386208,
881
+ "grad_norm": 17.738548278808594,
882
+ "learning_rate": 4.7513259893920855e-05,
883
+ "loss": 1.1747,
884
+ "step": 1220
885
+ },
886
+ {
887
+ "epoch": 0.5018871773946751,
888
+ "grad_norm": 41.57831573486328,
889
+ "learning_rate": 4.749286005711954e-05,
890
+ "loss": 1.5605,
891
+ "step": 1230
892
+ },
893
+ {
894
+ "epoch": 0.5059675609507294,
895
+ "grad_norm": 8.6355562210083,
896
+ "learning_rate": 4.747246022031824e-05,
897
+ "loss": 2.7391,
898
+ "step": 1240
899
+ },
900
+ {
901
+ "epoch": 0.5100479445067836,
902
+ "grad_norm": 0.7144743800163269,
903
+ "learning_rate": 4.745206038351693e-05,
904
+ "loss": 1.1142,
905
+ "step": 1250
906
+ },
907
+ {
908
+ "epoch": 0.5141283280628379,
909
+ "grad_norm": 58.17163848876953,
910
+ "learning_rate": 4.7431660546715626e-05,
911
+ "loss": 2.1911,
912
+ "step": 1260
913
+ },
914
+ {
915
+ "epoch": 0.5182087116188921,
916
+ "grad_norm": 0.7392058372497559,
917
+ "learning_rate": 4.741126070991432e-05,
918
+ "loss": 0.1781,
919
+ "step": 1270
920
+ },
921
+ {
922
+ "epoch": 0.5222890951749465,
923
+ "grad_norm": 10.575345993041992,
924
+ "learning_rate": 4.739086087311302e-05,
925
+ "loss": 0.8711,
926
+ "step": 1280
927
+ },
928
+ {
929
+ "epoch": 0.5263694787310007,
930
+ "grad_norm": 11.3096923828125,
931
+ "learning_rate": 4.7370461036311716e-05,
932
+ "loss": 0.5449,
933
+ "step": 1290
934
+ },
935
+ {
936
+ "epoch": 0.5304498622870549,
937
+ "grad_norm": 34.84663772583008,
938
+ "learning_rate": 4.7350061199510404e-05,
939
+ "loss": 1.5937,
940
+ "step": 1300
941
+ },
942
+ {
943
+ "epoch": 0.5345302458431093,
944
+ "grad_norm": 49.56230926513672,
945
+ "learning_rate": 4.73296613627091e-05,
946
+ "loss": 1.5281,
947
+ "step": 1310
948
+ },
949
+ {
950
+ "epoch": 0.5386106293991635,
951
+ "grad_norm": 21.019792556762695,
952
+ "learning_rate": 4.730926152590779e-05,
953
+ "loss": 1.2034,
954
+ "step": 1320
955
+ },
956
+ {
957
+ "epoch": 0.5426910129552178,
958
+ "grad_norm": 13.464981079101562,
959
+ "learning_rate": 4.728886168910649e-05,
960
+ "loss": 1.0776,
961
+ "step": 1330
962
+ },
963
+ {
964
+ "epoch": 0.546771396511272,
965
+ "grad_norm": 112.07856750488281,
966
+ "learning_rate": 4.726846185230518e-05,
967
+ "loss": 1.1755,
968
+ "step": 1340
969
+ },
970
+ {
971
+ "epoch": 0.5508517800673264,
972
+ "grad_norm": 38.1587028503418,
973
+ "learning_rate": 4.724806201550388e-05,
974
+ "loss": 0.325,
975
+ "step": 1350
976
+ },
977
+ {
978
+ "epoch": 0.5549321636233806,
979
+ "grad_norm": 38.14012145996094,
980
+ "learning_rate": 4.722766217870258e-05,
981
+ "loss": 0.4036,
982
+ "step": 1360
983
+ },
984
+ {
985
+ "epoch": 0.5590125471794348,
986
+ "grad_norm": 3.4571340084075928,
987
+ "learning_rate": 4.7207262341901265e-05,
988
+ "loss": 1.6188,
989
+ "step": 1370
990
+ },
991
+ {
992
+ "epoch": 0.5630929307354892,
993
+ "grad_norm": 40.20029830932617,
994
+ "learning_rate": 4.718686250509996e-05,
995
+ "loss": 0.5831,
996
+ "step": 1380
997
+ },
998
+ {
999
+ "epoch": 0.5671733142915434,
1000
+ "grad_norm": 0.15889263153076172,
1001
+ "learning_rate": 4.7166462668298654e-05,
1002
+ "loss": 1.14,
1003
+ "step": 1390
1004
+ },
1005
+ {
1006
+ "epoch": 0.5712536978475977,
1007
+ "grad_norm": 2.5688552856445312,
1008
+ "learning_rate": 4.714606283149735e-05,
1009
+ "loss": 2.3429,
1010
+ "step": 1400
1011
+ }
1012
+ ],
1013
+ "logging_steps": 10,
1014
+ "max_steps": 24510,
1015
+ "num_input_tokens_seen": 0,
1016
+ "num_train_epochs": 10,
1017
+ "save_steps": 50,
1018
+ "stateful_callbacks": {
1019
+ "TrainerControl": {
1020
+ "args": {
1021
+ "should_epoch_stop": false,
1022
+ "should_evaluate": false,
1023
+ "should_log": false,
1024
+ "should_save": true,
1025
+ "should_training_stop": false
1026
+ },
1027
+ "attributes": {}
1028
+ }
1029
+ },
1030
+ "total_flos": 1.0806565615071437e+18,
1031
+ "train_batch_size": 1,
1032
+ "trial_name": null,
1033
+ "trial_params": null
1034
+ }
multievent_forecaster_polymarket_4b_1400/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8563fb393abba4ae9835472d50fd53de922abce9b5fb333a7f462c130b4011ff
3
+ size 5969