skyyyyks commited on
Commit
b50e3e3
·
verified ·
1 Parent(s): d43c9d4

Upload folder using huggingface_hub

Browse files
multievent_forecaster_polymarket_8b_1700/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-8B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-8B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
multievent_forecaster_polymarket_8b_1700/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "/mnt/tidal-alsh-share2/usr/wangshanyong/models/Qwen/Qwen3-8B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 64,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "v_proj",
34
+ "k_proj",
35
+ "o_proj"
36
+ ],
37
+ "target_parameters": null,
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
multievent_forecaster_polymarket_8b_1700/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d1517fe37c8e7a7a8102691f87b36af5f6634be4ef0a3ce3e7b4fd6bcf56165
3
+ size 245405936
multievent_forecaster_polymarket_8b_1700/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "/mnt/tidal-alsh-share2/usr/wangshanyong/models/Qwen/Qwen3-8B",
3
+ "max_length": 1024,
4
+ "model_type": "llm_regressor",
5
+ "transformers_version": "4.57.6"
6
+ }
multievent_forecaster_polymarket_8b_1700/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2eddb53a640ea85be2697985a62ccfa701722d8d94c04f65aa1e0e23f2513144
3
+ size 499575883
multievent_forecaster_polymarket_8b_1700/regression_head.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb082b55dc861e0caeac1c5b34c9a5f66a5cc401bf3a1073e40682e96ffd7548
3
+ size 4297917
multievent_forecaster_polymarket_8b_1700/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:530059dc8208c518f64a244028c5410dddbcf9da5e15c6a6bc668b37392d6d1d
3
+ size 14645
multievent_forecaster_polymarket_8b_1700/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9b401f43f4cd2c303a501e653844ab022f061f7c4784e096428f0ecc1cdfdfb
3
+ size 1465
multievent_forecaster_polymarket_8b_1700/trainer_state.json ADDED
@@ -0,0 +1,1254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 500,
3
+ "best_metric": 0.15539318323135376,
4
+ "best_model_checkpoint": "../saves/multievent_forecaster_polymarket_8b/checkpoint-500",
5
+ "epoch": 0.6936652045292258,
6
+ "eval_steps": 500,
7
+ "global_step": 1700,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.004080383556054269,
14
+ "grad_norm": 2.310525417327881,
15
+ "learning_rate": 4.9981640146878825e-05,
16
+ "loss": 1.534,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.008160767112108539,
21
+ "grad_norm": 28.321460723876953,
22
+ "learning_rate": 4.996124031007752e-05,
23
+ "loss": 0.427,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.012241150668162807,
28
+ "grad_norm": 80.2139892578125,
29
+ "learning_rate": 4.994084047327622e-05,
30
+ "loss": 1.2319,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.016321534224217078,
35
+ "grad_norm": 69.97799682617188,
36
+ "learning_rate": 4.9920440636474916e-05,
37
+ "loss": 1.215,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.020401917780271346,
42
+ "grad_norm": 58.33653259277344,
43
+ "learning_rate": 4.99000407996736e-05,
44
+ "loss": 1.3289,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.024482301336325615,
49
+ "grad_norm": 47.93499755859375,
50
+ "learning_rate": 4.98796409628723e-05,
51
+ "loss": 0.8388,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.028562684892379883,
56
+ "grad_norm": 4.571518898010254,
57
+ "learning_rate": 4.985924112607099e-05,
58
+ "loss": 0.775,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.032643068448434155,
63
+ "grad_norm": 111.51830291748047,
64
+ "learning_rate": 4.9838841289269687e-05,
65
+ "loss": 1.7161,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.03672345200448842,
70
+ "grad_norm": 10.470147132873535,
71
+ "learning_rate": 4.981844145246838e-05,
72
+ "loss": 2.3538,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.04080383556054269,
77
+ "grad_norm": 4.704714298248291,
78
+ "learning_rate": 4.979804161566708e-05,
79
+ "loss": 1.1272,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.04488421911659696,
84
+ "grad_norm": 42.333065032958984,
85
+ "learning_rate": 4.977764177886577e-05,
86
+ "loss": 1.3603,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.04896460267265123,
91
+ "grad_norm": 29.616594314575195,
92
+ "learning_rate": 4.9757241942064464e-05,
93
+ "loss": 1.4551,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.0530449862287055,
98
+ "grad_norm": 33.91548156738281,
99
+ "learning_rate": 4.973684210526316e-05,
100
+ "loss": 1.7047,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.057125369784759766,
105
+ "grad_norm": 42.891300201416016,
106
+ "learning_rate": 4.971644226846185e-05,
107
+ "loss": 1.5121,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.06120575334081404,
112
+ "grad_norm": 19.2835693359375,
113
+ "learning_rate": 4.969604243166055e-05,
114
+ "loss": 1.0465,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.06528613689686831,
119
+ "grad_norm": 88.87532043457031,
120
+ "learning_rate": 4.967564259485924e-05,
121
+ "loss": 1.2853,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.06936652045292258,
126
+ "grad_norm": 54.34832763671875,
127
+ "learning_rate": 4.9655242758057944e-05,
128
+ "loss": 0.7364,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.07344690400897684,
133
+ "grad_norm": 41.521263122558594,
134
+ "learning_rate": 4.963484292125663e-05,
135
+ "loss": 2.0201,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.07752728756503112,
140
+ "grad_norm": 35.32502746582031,
141
+ "learning_rate": 4.9614443084455326e-05,
142
+ "loss": 0.3534,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.08160767112108538,
147
+ "grad_norm": 46.29753494262695,
148
+ "learning_rate": 4.959404324765402e-05,
149
+ "loss": 0.45,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.08568805467713965,
154
+ "grad_norm": 123.40519714355469,
155
+ "learning_rate": 4.9573643410852715e-05,
156
+ "loss": 2.4673,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.08976843823319391,
161
+ "grad_norm": 53.588077545166016,
162
+ "learning_rate": 4.955324357405141e-05,
163
+ "loss": 0.5119,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.0938488217892482,
168
+ "grad_norm": 3.450457811355591,
169
+ "learning_rate": 4.9532843737250103e-05,
170
+ "loss": 1.5715,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.09792920534530246,
175
+ "grad_norm": 0.7993666529655457,
176
+ "learning_rate": 4.95124439004488e-05,
177
+ "loss": 1.6316,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.10200958890135672,
182
+ "grad_norm": 0.9596285820007324,
183
+ "learning_rate": 4.949204406364749e-05,
184
+ "loss": 1.0762,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.106089972457411,
189
+ "grad_norm": 34.78837203979492,
190
+ "learning_rate": 4.947164422684619e-05,
191
+ "loss": 0.3556,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.11017035601346527,
196
+ "grad_norm": 6.915394306182861,
197
+ "learning_rate": 4.945124439004488e-05,
198
+ "loss": 0.5495,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.11425073956951953,
203
+ "grad_norm": 31.66779899597168,
204
+ "learning_rate": 4.9430844553243576e-05,
205
+ "loss": 1.8311,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.1183311231255738,
210
+ "grad_norm": 42.642547607421875,
211
+ "learning_rate": 4.941044471644227e-05,
212
+ "loss": 1.2792,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.12241150668162808,
217
+ "grad_norm": 25.581951141357422,
218
+ "learning_rate": 4.9390044879640965e-05,
219
+ "loss": 1.7074,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.12649189023768234,
224
+ "grad_norm": 6.600286483764648,
225
+ "learning_rate": 4.936964504283966e-05,
226
+ "loss": 0.4821,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.13057227379373662,
231
+ "grad_norm": 7.8896870613098145,
232
+ "learning_rate": 4.9349245206038354e-05,
233
+ "loss": 0.5994,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.13465265734979087,
238
+ "grad_norm": 6.225551128387451,
239
+ "learning_rate": 4.932884536923705e-05,
240
+ "loss": 0.8307,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.13873304090584515,
245
+ "grad_norm": 9.000911712646484,
246
+ "learning_rate": 4.930844553243574e-05,
247
+ "loss": 0.9769,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.14281342446189943,
252
+ "grad_norm": 2.856774091720581,
253
+ "learning_rate": 4.928804569563444e-05,
254
+ "loss": 1.409,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.14689380801795368,
259
+ "grad_norm": 20.0944881439209,
260
+ "learning_rate": 4.926764585883313e-05,
261
+ "loss": 0.9768,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.15097419157400796,
266
+ "grad_norm": 0.2755752503871918,
267
+ "learning_rate": 4.9247246022031826e-05,
268
+ "loss": 2.5315,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.15505457513006224,
273
+ "grad_norm": 14.503312110900879,
274
+ "learning_rate": 4.922684618523052e-05,
275
+ "loss": 0.4262,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 0.1591349586861165,
280
+ "grad_norm": 0.9226180911064148,
281
+ "learning_rate": 4.9206446348429215e-05,
282
+ "loss": 3.2264,
283
+ "step": 390
284
+ },
285
+ {
286
+ "epoch": 0.16321534224217077,
287
+ "grad_norm": 34.80778121948242,
288
+ "learning_rate": 4.918604651162791e-05,
289
+ "loss": 1.1347,
290
+ "step": 400
291
+ },
292
+ {
293
+ "epoch": 0.16729572579822502,
294
+ "grad_norm": 66.57282257080078,
295
+ "learning_rate": 4.9165646674826604e-05,
296
+ "loss": 0.9757,
297
+ "step": 410
298
+ },
299
+ {
300
+ "epoch": 0.1713761093542793,
301
+ "grad_norm": 46.86370086669922,
302
+ "learning_rate": 4.91452468380253e-05,
303
+ "loss": 1.5765,
304
+ "step": 420
305
+ },
306
+ {
307
+ "epoch": 0.17545649291033358,
308
+ "grad_norm": 2.965914487838745,
309
+ "learning_rate": 4.912484700122399e-05,
310
+ "loss": 1.848,
311
+ "step": 430
312
+ },
313
+ {
314
+ "epoch": 0.17953687646638783,
315
+ "grad_norm": 46.13410186767578,
316
+ "learning_rate": 4.910444716442269e-05,
317
+ "loss": 2.6137,
318
+ "step": 440
319
+ },
320
+ {
321
+ "epoch": 0.1836172600224421,
322
+ "grad_norm": 44.24580001831055,
323
+ "learning_rate": 4.908404732762138e-05,
324
+ "loss": 1.0493,
325
+ "step": 450
326
+ },
327
+ {
328
+ "epoch": 0.1876976435784964,
329
+ "grad_norm": 1.186919927597046,
330
+ "learning_rate": 4.9063647490820076e-05,
331
+ "loss": 1.1336,
332
+ "step": 460
333
+ },
334
+ {
335
+ "epoch": 0.19177802713455064,
336
+ "grad_norm": 118.10325622558594,
337
+ "learning_rate": 4.904324765401877e-05,
338
+ "loss": 1.5328,
339
+ "step": 470
340
+ },
341
+ {
342
+ "epoch": 0.19585841069060492,
343
+ "grad_norm": 0.642149806022644,
344
+ "learning_rate": 4.9022847817217465e-05,
345
+ "loss": 0.783,
346
+ "step": 480
347
+ },
348
+ {
349
+ "epoch": 0.1999387942466592,
350
+ "grad_norm": 83.65412902832031,
351
+ "learning_rate": 4.900244798041616e-05,
352
+ "loss": 1.2663,
353
+ "step": 490
354
+ },
355
+ {
356
+ "epoch": 0.20401917780271345,
357
+ "grad_norm": 30.820880889892578,
358
+ "learning_rate": 4.8982048143614854e-05,
359
+ "loss": 1.5877,
360
+ "step": 500
361
+ },
362
+ {
363
+ "epoch": 0.20401917780271345,
364
+ "eval_delta_mse": 0.006563735194504261,
365
+ "eval_loss": 0.15539318323135376,
366
+ "eval_price_mse": 0.006563735194504261,
367
+ "eval_runtime": 15822.7488,
368
+ "eval_samples_per_second": 0.233,
369
+ "eval_steps_per_second": 0.233,
370
+ "step": 500
371
+ },
372
+ {
373
+ "epoch": 0.20809956135876773,
374
+ "grad_norm": 1.2009692192077637,
375
+ "learning_rate": 4.896164830681355e-05,
376
+ "loss": 3.7942,
377
+ "step": 510
378
+ },
379
+ {
380
+ "epoch": 0.212179944914822,
381
+ "grad_norm": 116.35127258300781,
382
+ "learning_rate": 4.894124847001224e-05,
383
+ "loss": 1.3409,
384
+ "step": 520
385
+ },
386
+ {
387
+ "epoch": 0.21626032847087626,
388
+ "grad_norm": 14.146204948425293,
389
+ "learning_rate": 4.892084863321094e-05,
390
+ "loss": 2.3588,
391
+ "step": 530
392
+ },
393
+ {
394
+ "epoch": 0.22034071202693054,
395
+ "grad_norm": 77.455078125,
396
+ "learning_rate": 4.890044879640963e-05,
397
+ "loss": 0.9603,
398
+ "step": 540
399
+ },
400
+ {
401
+ "epoch": 0.22442109558298481,
402
+ "grad_norm": 18.571735382080078,
403
+ "learning_rate": 4.8880048959608326e-05,
404
+ "loss": 0.4675,
405
+ "step": 550
406
+ },
407
+ {
408
+ "epoch": 0.22850147913903907,
409
+ "grad_norm": 27.24749755859375,
410
+ "learning_rate": 4.885964912280702e-05,
411
+ "loss": 2.0077,
412
+ "step": 560
413
+ },
414
+ {
415
+ "epoch": 0.23258186269509334,
416
+ "grad_norm": 52.052703857421875,
417
+ "learning_rate": 4.883924928600571e-05,
418
+ "loss": 1.2055,
419
+ "step": 570
420
+ },
421
+ {
422
+ "epoch": 0.2366622462511476,
423
+ "grad_norm": 71.48110961914062,
424
+ "learning_rate": 4.881884944920441e-05,
425
+ "loss": 1.3145,
426
+ "step": 580
427
+ },
428
+ {
429
+ "epoch": 0.24074262980720187,
430
+ "grad_norm": 21.496164321899414,
431
+ "learning_rate": 4.8798449612403104e-05,
432
+ "loss": 0.9349,
433
+ "step": 590
434
+ },
435
+ {
436
+ "epoch": 0.24482301336325615,
437
+ "grad_norm": 36.404296875,
438
+ "learning_rate": 4.87780497756018e-05,
439
+ "loss": 0.4031,
440
+ "step": 600
441
+ },
442
+ {
443
+ "epoch": 0.2489033969193104,
444
+ "grad_norm": 4.12367582321167,
445
+ "learning_rate": 4.875764993880049e-05,
446
+ "loss": 0.7925,
447
+ "step": 610
448
+ },
449
+ {
450
+ "epoch": 0.2529837804753647,
451
+ "grad_norm": 37.512237548828125,
452
+ "learning_rate": 4.873725010199919e-05,
453
+ "loss": 0.557,
454
+ "step": 620
455
+ },
456
+ {
457
+ "epoch": 0.25706416403141896,
458
+ "grad_norm": 61.9686279296875,
459
+ "learning_rate": 4.871685026519788e-05,
460
+ "loss": 1.4473,
461
+ "step": 630
462
+ },
463
+ {
464
+ "epoch": 0.26114454758747324,
465
+ "grad_norm": 55.5275993347168,
466
+ "learning_rate": 4.869645042839657e-05,
467
+ "loss": 1.2829,
468
+ "step": 640
469
+ },
470
+ {
471
+ "epoch": 0.26522493114352746,
472
+ "grad_norm": 3.7023115158081055,
473
+ "learning_rate": 4.867605059159527e-05,
474
+ "loss": 2.1883,
475
+ "step": 650
476
+ },
477
+ {
478
+ "epoch": 0.26930531469958174,
479
+ "grad_norm": 62.24500274658203,
480
+ "learning_rate": 4.8655650754793965e-05,
481
+ "loss": 3.0595,
482
+ "step": 660
483
+ },
484
+ {
485
+ "epoch": 0.273385698255636,
486
+ "grad_norm": 48.439453125,
487
+ "learning_rate": 4.863525091799266e-05,
488
+ "loss": 1.6106,
489
+ "step": 670
490
+ },
491
+ {
492
+ "epoch": 0.2774660818116903,
493
+ "grad_norm": 11.552438735961914,
494
+ "learning_rate": 4.8614851081191354e-05,
495
+ "loss": 0.767,
496
+ "step": 680
497
+ },
498
+ {
499
+ "epoch": 0.2815464653677446,
500
+ "grad_norm": 25.360143661499023,
501
+ "learning_rate": 4.859445124439005e-05,
502
+ "loss": 2.0025,
503
+ "step": 690
504
+ },
505
+ {
506
+ "epoch": 0.28562684892379886,
507
+ "grad_norm": 2.8084678649902344,
508
+ "learning_rate": 4.857405140758874e-05,
509
+ "loss": 0.7238,
510
+ "step": 700
511
+ },
512
+ {
513
+ "epoch": 0.2897072324798531,
514
+ "grad_norm": 21.731435775756836,
515
+ "learning_rate": 4.855365157078743e-05,
516
+ "loss": 1.3537,
517
+ "step": 710
518
+ },
519
+ {
520
+ "epoch": 0.29378761603590736,
521
+ "grad_norm": 6.917633056640625,
522
+ "learning_rate": 4.853325173398613e-05,
523
+ "loss": 1.7614,
524
+ "step": 720
525
+ },
526
+ {
527
+ "epoch": 0.29786799959196164,
528
+ "grad_norm": 18.585006713867188,
529
+ "learning_rate": 4.8512851897184826e-05,
530
+ "loss": 2.2206,
531
+ "step": 730
532
+ },
533
+ {
534
+ "epoch": 0.3019483831480159,
535
+ "grad_norm": 52.37718200683594,
536
+ "learning_rate": 4.849245206038352e-05,
537
+ "loss": 0.6439,
538
+ "step": 740
539
+ },
540
+ {
541
+ "epoch": 0.3060287667040702,
542
+ "grad_norm": 19.023473739624023,
543
+ "learning_rate": 4.8472052223582215e-05,
544
+ "loss": 0.838,
545
+ "step": 750
546
+ },
547
+ {
548
+ "epoch": 0.3101091502601245,
549
+ "grad_norm": 160.38790893554688,
550
+ "learning_rate": 4.845165238678091e-05,
551
+ "loss": 2.2673,
552
+ "step": 760
553
+ },
554
+ {
555
+ "epoch": 0.3141895338161787,
556
+ "grad_norm": 7.442062854766846,
557
+ "learning_rate": 4.84312525499796e-05,
558
+ "loss": 1.5411,
559
+ "step": 770
560
+ },
561
+ {
562
+ "epoch": 0.318269917372233,
563
+ "grad_norm": 4.024974822998047,
564
+ "learning_rate": 4.841085271317829e-05,
565
+ "loss": 0.4976,
566
+ "step": 780
567
+ },
568
+ {
569
+ "epoch": 0.32235030092828726,
570
+ "grad_norm": 16.560396194458008,
571
+ "learning_rate": 4.839045287637699e-05,
572
+ "loss": 0.2644,
573
+ "step": 790
574
+ },
575
+ {
576
+ "epoch": 0.32643068448434154,
577
+ "grad_norm": 7.441378593444824,
578
+ "learning_rate": 4.837005303957569e-05,
579
+ "loss": 1.1391,
580
+ "step": 800
581
+ },
582
+ {
583
+ "epoch": 0.3305110680403958,
584
+ "grad_norm": 0.21250449120998383,
585
+ "learning_rate": 4.834965320277438e-05,
586
+ "loss": 0.3384,
587
+ "step": 810
588
+ },
589
+ {
590
+ "epoch": 0.33459145159645004,
591
+ "grad_norm": 60.48158264160156,
592
+ "learning_rate": 4.8329253365973077e-05,
593
+ "loss": 1.1616,
594
+ "step": 820
595
+ },
596
+ {
597
+ "epoch": 0.3386718351525043,
598
+ "grad_norm": 4.342957496643066,
599
+ "learning_rate": 4.830885352917177e-05,
600
+ "loss": 0.3139,
601
+ "step": 830
602
+ },
603
+ {
604
+ "epoch": 0.3427522187085586,
605
+ "grad_norm": 15.96242618560791,
606
+ "learning_rate": 4.828845369237046e-05,
607
+ "loss": 1.0006,
608
+ "step": 840
609
+ },
610
+ {
611
+ "epoch": 0.3468326022646129,
612
+ "grad_norm": 55.67816925048828,
613
+ "learning_rate": 4.826805385556915e-05,
614
+ "loss": 1.0457,
615
+ "step": 850
616
+ },
617
+ {
618
+ "epoch": 0.35091298582066716,
619
+ "grad_norm": 29.242801666259766,
620
+ "learning_rate": 4.8247654018767854e-05,
621
+ "loss": 1.0143,
622
+ "step": 860
623
+ },
624
+ {
625
+ "epoch": 0.35499336937672143,
626
+ "grad_norm": 47.237030029296875,
627
+ "learning_rate": 4.822725418196655e-05,
628
+ "loss": 3.6061,
629
+ "step": 870
630
+ },
631
+ {
632
+ "epoch": 0.35907375293277566,
633
+ "grad_norm": 9.840912818908691,
634
+ "learning_rate": 4.820685434516524e-05,
635
+ "loss": 3.3593,
636
+ "step": 880
637
+ },
638
+ {
639
+ "epoch": 0.36315413648882994,
640
+ "grad_norm": 3.1284096240997314,
641
+ "learning_rate": 4.818645450836394e-05,
642
+ "loss": 1.4921,
643
+ "step": 890
644
+ },
645
+ {
646
+ "epoch": 0.3672345200448842,
647
+ "grad_norm": 1.2444665431976318,
648
+ "learning_rate": 4.8166054671562625e-05,
649
+ "loss": 2.5889,
650
+ "step": 900
651
+ },
652
+ {
653
+ "epoch": 0.3713149036009385,
654
+ "grad_norm": 9.726990699768066,
655
+ "learning_rate": 4.814565483476132e-05,
656
+ "loss": 1.844,
657
+ "step": 910
658
+ },
659
+ {
660
+ "epoch": 0.3753952871569928,
661
+ "grad_norm": 2.1426756381988525,
662
+ "learning_rate": 4.8125254997960014e-05,
663
+ "loss": 0.5866,
664
+ "step": 920
665
+ },
666
+ {
667
+ "epoch": 0.37947567071304705,
668
+ "grad_norm": 0.7916508316993713,
669
+ "learning_rate": 4.8104855161158716e-05,
670
+ "loss": 1.3884,
671
+ "step": 930
672
+ },
673
+ {
674
+ "epoch": 0.3835560542691013,
675
+ "grad_norm": 19.585935592651367,
676
+ "learning_rate": 4.808445532435741e-05,
677
+ "loss": 0.8296,
678
+ "step": 940
679
+ },
680
+ {
681
+ "epoch": 0.38763643782515556,
682
+ "grad_norm": 0.16525128483772278,
683
+ "learning_rate": 4.8064055487556105e-05,
684
+ "loss": 0.787,
685
+ "step": 950
686
+ },
687
+ {
688
+ "epoch": 0.39171682138120983,
689
+ "grad_norm": 20.36138343811035,
690
+ "learning_rate": 4.80436556507548e-05,
691
+ "loss": 1.8585,
692
+ "step": 960
693
+ },
694
+ {
695
+ "epoch": 0.3957972049372641,
696
+ "grad_norm": 3.199782371520996,
697
+ "learning_rate": 4.802325581395349e-05,
698
+ "loss": 0.5838,
699
+ "step": 970
700
+ },
701
+ {
702
+ "epoch": 0.3998775884933184,
703
+ "grad_norm": 61.45878219604492,
704
+ "learning_rate": 4.800285597715218e-05,
705
+ "loss": 0.8027,
706
+ "step": 980
707
+ },
708
+ {
709
+ "epoch": 0.4039579720493726,
710
+ "grad_norm": 25.76358413696289,
711
+ "learning_rate": 4.7982456140350876e-05,
712
+ "loss": 1.2722,
713
+ "step": 990
714
+ },
715
+ {
716
+ "epoch": 0.4080383556054269,
717
+ "grad_norm": 1.0910900831222534,
718
+ "learning_rate": 4.796205630354958e-05,
719
+ "loss": 1.7975,
720
+ "step": 1000
721
+ },
722
+ {
723
+ "epoch": 0.4080383556054269,
724
+ "eval_delta_mse": 0.007387589197605848,
725
+ "eval_loss": 0.1594492644071579,
726
+ "eval_price_mse": 0.007387589197605848,
727
+ "eval_runtime": 15820.57,
728
+ "eval_samples_per_second": 0.233,
729
+ "eval_steps_per_second": 0.233,
730
+ "step": 1000
731
+ },
732
+ {
733
+ "epoch": 0.4121187391614812,
734
+ "grad_norm": 38.62668991088867,
735
+ "learning_rate": 4.794165646674827e-05,
736
+ "loss": 1.6595,
737
+ "step": 1010
738
+ },
739
+ {
740
+ "epoch": 0.41619912271753545,
741
+ "grad_norm": 38.00244140625,
742
+ "learning_rate": 4.7921256629946966e-05,
743
+ "loss": 1.5699,
744
+ "step": 1020
745
+ },
746
+ {
747
+ "epoch": 0.42027950627358973,
748
+ "grad_norm": 12.713574409484863,
749
+ "learning_rate": 4.790085679314566e-05,
750
+ "loss": 1.0679,
751
+ "step": 1030
752
+ },
753
+ {
754
+ "epoch": 0.424359889829644,
755
+ "grad_norm": 2.396801710128784,
756
+ "learning_rate": 4.788045695634435e-05,
757
+ "loss": 1.69,
758
+ "step": 1040
759
+ },
760
+ {
761
+ "epoch": 0.42844027338569823,
762
+ "grad_norm": 36.214935302734375,
763
+ "learning_rate": 4.786005711954304e-05,
764
+ "loss": 0.9044,
765
+ "step": 1050
766
+ },
767
+ {
768
+ "epoch": 0.4325206569417525,
769
+ "grad_norm": 9.050984382629395,
770
+ "learning_rate": 4.783965728274174e-05,
771
+ "loss": 1.0151,
772
+ "step": 1060
773
+ },
774
+ {
775
+ "epoch": 0.4366010404978068,
776
+ "grad_norm": 0.3593282997608185,
777
+ "learning_rate": 4.781925744594044e-05,
778
+ "loss": 1.1096,
779
+ "step": 1070
780
+ },
781
+ {
782
+ "epoch": 0.44068142405386107,
783
+ "grad_norm": 3.4728405475616455,
784
+ "learning_rate": 4.779885760913913e-05,
785
+ "loss": 0.5297,
786
+ "step": 1080
787
+ },
788
+ {
789
+ "epoch": 0.44476180760991535,
790
+ "grad_norm": 0.48938944935798645,
791
+ "learning_rate": 4.777845777233783e-05,
792
+ "loss": 0.3162,
793
+ "step": 1090
794
+ },
795
+ {
796
+ "epoch": 0.44884219116596963,
797
+ "grad_norm": 1.344174861907959,
798
+ "learning_rate": 4.7758057935536515e-05,
799
+ "loss": 1.6664,
800
+ "step": 1100
801
+ },
802
+ {
803
+ "epoch": 0.45292257472202385,
804
+ "grad_norm": 1.3220082521438599,
805
+ "learning_rate": 4.773765809873521e-05,
806
+ "loss": 0.3818,
807
+ "step": 1110
808
+ },
809
+ {
810
+ "epoch": 0.45700295827807813,
811
+ "grad_norm": 79.93708038330078,
812
+ "learning_rate": 4.7717258261933904e-05,
813
+ "loss": 1.1598,
814
+ "step": 1120
815
+ },
816
+ {
817
+ "epoch": 0.4610833418341324,
818
+ "grad_norm": 6.176994323730469,
819
+ "learning_rate": 4.76968584251326e-05,
820
+ "loss": 2.0552,
821
+ "step": 1130
822
+ },
823
+ {
824
+ "epoch": 0.4651637253901867,
825
+ "grad_norm": 1.3886823654174805,
826
+ "learning_rate": 4.76764585883313e-05,
827
+ "loss": 3.3444,
828
+ "step": 1140
829
+ },
830
+ {
831
+ "epoch": 0.46924410894624097,
832
+ "grad_norm": 55.96194076538086,
833
+ "learning_rate": 4.7656058751529994e-05,
834
+ "loss": 3.1493,
835
+ "step": 1150
836
+ },
837
+ {
838
+ "epoch": 0.4733244925022952,
839
+ "grad_norm": 6.833325386047363,
840
+ "learning_rate": 4.763565891472869e-05,
841
+ "loss": 1.8034,
842
+ "step": 1160
843
+ },
844
+ {
845
+ "epoch": 0.47740487605834947,
846
+ "grad_norm": 14.919387817382812,
847
+ "learning_rate": 4.7615259077927376e-05,
848
+ "loss": 1.0969,
849
+ "step": 1170
850
+ },
851
+ {
852
+ "epoch": 0.48148525961440375,
853
+ "grad_norm": 0.9844567179679871,
854
+ "learning_rate": 4.759485924112607e-05,
855
+ "loss": 0.4323,
856
+ "step": 1180
857
+ },
858
+ {
859
+ "epoch": 0.48556564317045803,
860
+ "grad_norm": 6.165961742401123,
861
+ "learning_rate": 4.7574459404324765e-05,
862
+ "loss": 3.8485,
863
+ "step": 1190
864
+ },
865
+ {
866
+ "epoch": 0.4896460267265123,
867
+ "grad_norm": 2.711315870285034,
868
+ "learning_rate": 4.755405956752346e-05,
869
+ "loss": 0.5115,
870
+ "step": 1200
871
+ },
872
+ {
873
+ "epoch": 0.4937264102825666,
874
+ "grad_norm": 0.5381933450698853,
875
+ "learning_rate": 4.753365973072216e-05,
876
+ "loss": 3.134,
877
+ "step": 1210
878
+ },
879
+ {
880
+ "epoch": 0.4978067938386208,
881
+ "grad_norm": 11.511136054992676,
882
+ "learning_rate": 4.7513259893920855e-05,
883
+ "loss": 1.1924,
884
+ "step": 1220
885
+ },
886
+ {
887
+ "epoch": 0.5018871773946751,
888
+ "grad_norm": 27.313264846801758,
889
+ "learning_rate": 4.749286005711954e-05,
890
+ "loss": 1.4419,
891
+ "step": 1230
892
+ },
893
+ {
894
+ "epoch": 0.5059675609507294,
895
+ "grad_norm": 11.97879409790039,
896
+ "learning_rate": 4.747246022031824e-05,
897
+ "loss": 2.9577,
898
+ "step": 1240
899
+ },
900
+ {
901
+ "epoch": 0.5100479445067836,
902
+ "grad_norm": 0.34498393535614014,
903
+ "learning_rate": 4.745206038351693e-05,
904
+ "loss": 1.0691,
905
+ "step": 1250
906
+ },
907
+ {
908
+ "epoch": 0.5141283280628379,
909
+ "grad_norm": 43.153377532958984,
910
+ "learning_rate": 4.7431660546715626e-05,
911
+ "loss": 2.1727,
912
+ "step": 1260
913
+ },
914
+ {
915
+ "epoch": 0.5182087116188921,
916
+ "grad_norm": 1.077343463897705,
917
+ "learning_rate": 4.741126070991432e-05,
918
+ "loss": 0.22,
919
+ "step": 1270
920
+ },
921
+ {
922
+ "epoch": 0.5222890951749465,
923
+ "grad_norm": 7.850944995880127,
924
+ "learning_rate": 4.739086087311302e-05,
925
+ "loss": 0.7635,
926
+ "step": 1280
927
+ },
928
+ {
929
+ "epoch": 0.5263694787310007,
930
+ "grad_norm": 4.523067474365234,
931
+ "learning_rate": 4.7370461036311716e-05,
932
+ "loss": 0.526,
933
+ "step": 1290
934
+ },
935
+ {
936
+ "epoch": 0.5304498622870549,
937
+ "grad_norm": 27.282987594604492,
938
+ "learning_rate": 4.7350061199510404e-05,
939
+ "loss": 1.5905,
940
+ "step": 1300
941
+ },
942
+ {
943
+ "epoch": 0.5345302458431093,
944
+ "grad_norm": 40.11876678466797,
945
+ "learning_rate": 4.73296613627091e-05,
946
+ "loss": 1.5064,
947
+ "step": 1310
948
+ },
949
+ {
950
+ "epoch": 0.5386106293991635,
951
+ "grad_norm": 17.26888084411621,
952
+ "learning_rate": 4.730926152590779e-05,
953
+ "loss": 1.2041,
954
+ "step": 1320
955
+ },
956
+ {
957
+ "epoch": 0.5426910129552178,
958
+ "grad_norm": 10.868350982666016,
959
+ "learning_rate": 4.728886168910649e-05,
960
+ "loss": 1.0819,
961
+ "step": 1330
962
+ },
963
+ {
964
+ "epoch": 0.546771396511272,
965
+ "grad_norm": 92.16282653808594,
966
+ "learning_rate": 4.726846185230518e-05,
967
+ "loss": 1.1683,
968
+ "step": 1340
969
+ },
970
+ {
971
+ "epoch": 0.5508517800673264,
972
+ "grad_norm": 27.69321060180664,
973
+ "learning_rate": 4.724806201550388e-05,
974
+ "loss": 0.2812,
975
+ "step": 1350
976
+ },
977
+ {
978
+ "epoch": 0.5549321636233806,
979
+ "grad_norm": 32.01507568359375,
980
+ "learning_rate": 4.722766217870258e-05,
981
+ "loss": 0.391,
982
+ "step": 1360
983
+ },
984
+ {
985
+ "epoch": 0.5590125471794348,
986
+ "grad_norm": 1.6365959644317627,
987
+ "learning_rate": 4.7207262341901265e-05,
988
+ "loss": 1.6068,
989
+ "step": 1370
990
+ },
991
+ {
992
+ "epoch": 0.5630929307354892,
993
+ "grad_norm": 36.97065353393555,
994
+ "learning_rate": 4.718686250509996e-05,
995
+ "loss": 0.5799,
996
+ "step": 1380
997
+ },
998
+ {
999
+ "epoch": 0.5671733142915434,
1000
+ "grad_norm": 0.26953190565109253,
1001
+ "learning_rate": 4.7166462668298654e-05,
1002
+ "loss": 1.177,
1003
+ "step": 1390
1004
+ },
1005
+ {
1006
+ "epoch": 0.5712536978475977,
1007
+ "grad_norm": 2.353876829147339,
1008
+ "learning_rate": 4.714606283149735e-05,
1009
+ "loss": 2.3293,
1010
+ "step": 1400
1011
+ },
1012
+ {
1013
+ "epoch": 0.5753340814036519,
1014
+ "grad_norm": 159.8494873046875,
1015
+ "learning_rate": 4.712566299469604e-05,
1016
+ "loss": 3.5817,
1017
+ "step": 1410
1018
+ },
1019
+ {
1020
+ "epoch": 0.5794144649597062,
1021
+ "grad_norm": 5.0206618309021,
1022
+ "learning_rate": 4.7105263157894744e-05,
1023
+ "loss": 2.608,
1024
+ "step": 1420
1025
+ },
1026
+ {
1027
+ "epoch": 0.5834948485157605,
1028
+ "grad_norm": 43.106048583984375,
1029
+ "learning_rate": 4.708486332109343e-05,
1030
+ "loss": 1.5755,
1031
+ "step": 1430
1032
+ },
1033
+ {
1034
+ "epoch": 0.5875752320718147,
1035
+ "grad_norm": 5.7215657234191895,
1036
+ "learning_rate": 4.7064463484292126e-05,
1037
+ "loss": 0.745,
1038
+ "step": 1440
1039
+ },
1040
+ {
1041
+ "epoch": 0.5916556156278691,
1042
+ "grad_norm": 3.9160101413726807,
1043
+ "learning_rate": 4.704406364749082e-05,
1044
+ "loss": 0.8479,
1045
+ "step": 1450
1046
+ },
1047
+ {
1048
+ "epoch": 0.5957359991839233,
1049
+ "grad_norm": 17.182361602783203,
1050
+ "learning_rate": 4.7023663810689515e-05,
1051
+ "loss": 1.57,
1052
+ "step": 1460
1053
+ },
1054
+ {
1055
+ "epoch": 0.5998163827399775,
1056
+ "grad_norm": 2.296912670135498,
1057
+ "learning_rate": 4.700326397388821e-05,
1058
+ "loss": 1.4757,
1059
+ "step": 1470
1060
+ },
1061
+ {
1062
+ "epoch": 0.6038967662960318,
1063
+ "grad_norm": 37.03803634643555,
1064
+ "learning_rate": 4.6982864137086904e-05,
1065
+ "loss": 1.1419,
1066
+ "step": 1480
1067
+ },
1068
+ {
1069
+ "epoch": 0.6079771498520861,
1070
+ "grad_norm": 61.62291717529297,
1071
+ "learning_rate": 4.6962464300285605e-05,
1072
+ "loss": 1.9084,
1073
+ "step": 1490
1074
+ },
1075
+ {
1076
+ "epoch": 0.6120575334081404,
1077
+ "grad_norm": 13.151871681213379,
1078
+ "learning_rate": 4.694206446348429e-05,
1079
+ "loss": 1.0139,
1080
+ "step": 1500
1081
+ },
1082
+ {
1083
+ "epoch": 0.6120575334081404,
1084
+ "eval_delta_mse": 0.0071313283406198025,
1085
+ "eval_loss": 0.16127151250839233,
1086
+ "eval_price_mse": 0.0071313283406198025,
1087
+ "eval_runtime": 15822.5006,
1088
+ "eval_samples_per_second": 0.233,
1089
+ "eval_steps_per_second": 0.233,
1090
+ "step": 1500
1091
+ },
1092
+ {
1093
+ "epoch": 0.6161379169641946,
1094
+ "grad_norm": 23.095911026000977,
1095
+ "learning_rate": 4.692166462668299e-05,
1096
+ "loss": 1.6575,
1097
+ "step": 1510
1098
+ },
1099
+ {
1100
+ "epoch": 0.620218300520249,
1101
+ "grad_norm": 6.681315898895264,
1102
+ "learning_rate": 4.690126478988168e-05,
1103
+ "loss": 2.114,
1104
+ "step": 1520
1105
+ },
1106
+ {
1107
+ "epoch": 0.6242986840763032,
1108
+ "grad_norm": 16.255008697509766,
1109
+ "learning_rate": 4.6880864953080376e-05,
1110
+ "loss": 1.3241,
1111
+ "step": 1530
1112
+ },
1113
+ {
1114
+ "epoch": 0.6283790676323574,
1115
+ "grad_norm": 4.873073101043701,
1116
+ "learning_rate": 4.686046511627907e-05,
1117
+ "loss": 1.1525,
1118
+ "step": 1540
1119
+ },
1120
+ {
1121
+ "epoch": 0.6324594511884117,
1122
+ "grad_norm": 0.5373139977455139,
1123
+ "learning_rate": 4.6840065279477765e-05,
1124
+ "loss": 1.6195,
1125
+ "step": 1550
1126
+ },
1127
+ {
1128
+ "epoch": 0.636539834744466,
1129
+ "grad_norm": 22.436830520629883,
1130
+ "learning_rate": 4.6819665442676467e-05,
1131
+ "loss": 2.6356,
1132
+ "step": 1560
1133
+ },
1134
+ {
1135
+ "epoch": 0.6406202183005203,
1136
+ "grad_norm": 0.24059373140335083,
1137
+ "learning_rate": 4.6799265605875154e-05,
1138
+ "loss": 0.5031,
1139
+ "step": 1570
1140
+ },
1141
+ {
1142
+ "epoch": 0.6447006018565745,
1143
+ "grad_norm": 26.824846267700195,
1144
+ "learning_rate": 4.677886576907385e-05,
1145
+ "loss": 0.4795,
1146
+ "step": 1580
1147
+ },
1148
+ {
1149
+ "epoch": 0.6487809854126287,
1150
+ "grad_norm": 30.047636032104492,
1151
+ "learning_rate": 4.675846593227254e-05,
1152
+ "loss": 3.2205,
1153
+ "step": 1590
1154
+ },
1155
+ {
1156
+ "epoch": 0.6528613689686831,
1157
+ "grad_norm": 47.49260330200195,
1158
+ "learning_rate": 4.673806609547124e-05,
1159
+ "loss": 1.4902,
1160
+ "step": 1600
1161
+ },
1162
+ {
1163
+ "epoch": 0.6569417525247373,
1164
+ "grad_norm": 25.408098220825195,
1165
+ "learning_rate": 4.671766625866993e-05,
1166
+ "loss": 1.2437,
1167
+ "step": 1610
1168
+ },
1169
+ {
1170
+ "epoch": 0.6610221360807916,
1171
+ "grad_norm": 39.53823471069336,
1172
+ "learning_rate": 4.6697266421868627e-05,
1173
+ "loss": 3.9338,
1174
+ "step": 1620
1175
+ },
1176
+ {
1177
+ "epoch": 0.6651025196368459,
1178
+ "grad_norm": 3.9683070182800293,
1179
+ "learning_rate": 4.667686658506732e-05,
1180
+ "loss": 1.4395,
1181
+ "step": 1630
1182
+ },
1183
+ {
1184
+ "epoch": 0.6691829031929001,
1185
+ "grad_norm": 43.940704345703125,
1186
+ "learning_rate": 4.6656466748266015e-05,
1187
+ "loss": 1.5728,
1188
+ "step": 1640
1189
+ },
1190
+ {
1191
+ "epoch": 0.6732632867489544,
1192
+ "grad_norm": 18.952613830566406,
1193
+ "learning_rate": 4.663606691146471e-05,
1194
+ "loss": 0.6726,
1195
+ "step": 1650
1196
+ },
1197
+ {
1198
+ "epoch": 0.6773436703050086,
1199
+ "grad_norm": 2.926729202270508,
1200
+ "learning_rate": 4.6615667074663404e-05,
1201
+ "loss": 1.3686,
1202
+ "step": 1660
1203
+ },
1204
+ {
1205
+ "epoch": 0.681424053861063,
1206
+ "grad_norm": 6.1550116539001465,
1207
+ "learning_rate": 4.65952672378621e-05,
1208
+ "loss": 2.3291,
1209
+ "step": 1670
1210
+ },
1211
+ {
1212
+ "epoch": 0.6855044374171172,
1213
+ "grad_norm": 12.013763427734375,
1214
+ "learning_rate": 4.657486740106079e-05,
1215
+ "loss": 0.8897,
1216
+ "step": 1680
1217
+ },
1218
+ {
1219
+ "epoch": 0.6895848209731715,
1220
+ "grad_norm": 17.76456069946289,
1221
+ "learning_rate": 4.655446756425949e-05,
1222
+ "loss": 0.8518,
1223
+ "step": 1690
1224
+ },
1225
+ {
1226
+ "epoch": 0.6936652045292258,
1227
+ "grad_norm": 4.194760322570801,
1228
+ "learning_rate": 4.653406772745818e-05,
1229
+ "loss": 2.9101,
1230
+ "step": 1700
1231
+ }
1232
+ ],
1233
+ "logging_steps": 10,
1234
+ "max_steps": 24510,
1235
+ "num_input_tokens_seen": 0,
1236
+ "num_train_epochs": 10,
1237
+ "save_steps": 50,
1238
+ "stateful_callbacks": {
1239
+ "TrainerControl": {
1240
+ "args": {
1241
+ "should_epoch_stop": false,
1242
+ "should_evaluate": false,
1243
+ "should_log": false,
1244
+ "should_save": true,
1245
+ "should_training_stop": false
1246
+ },
1247
+ "attributes": {}
1248
+ }
1249
+ },
1250
+ "total_flos": 2.7710647889534054e+18,
1251
+ "train_batch_size": 1,
1252
+ "trial_name": null,
1253
+ "trial_params": null
1254
+ }
multievent_forecaster_polymarket_8b_1700/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76f075fb01a446801f47d419b56a9a5edd63566f69fcc46c11475062de8a3729
3
+ size 5969