skyyyyks commited on
Commit
89f4ba9
·
verified ·
1 Parent(s): 7be5a8b

Upload folder using huggingface_hub

Browse files
prior_attributer_polymarket_8b_1000/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-8B
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.14.0
prior_attributer_polymarket_8b_1000/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/mnt/tidal-alsh-share2/usr/wangshanyong/models/Qwen/Qwen3-8B",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.1,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 64,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "o_proj",
27
+ "q_proj",
28
+ "k_proj",
29
+ "v_proj"
30
+ ],
31
+ "task_type": "CAUSAL_LM",
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
prior_attributer_polymarket_8b_1000/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:164267cafde4141db282e015f4af8450b6da1f2726c68b9982950c2d718daf32
3
+ size 245405936
prior_attributer_polymarket_8b_1000/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "/mnt/tidal-alsh-share2/usr/wangshanyong/models/Qwen/Qwen3-8B",
3
+ "max_length": 1024,
4
+ "model_type": "llm_regressor",
5
+ "transformers_version": "4.57.6"
6
+ }
prior_attributer_polymarket_8b_1000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:726e8633b64a8ef2afc8307a55779e8ec9e180a7a9b49ae39058ab09962d30f5
3
+ size 499571322
prior_attributer_polymarket_8b_1000/regression_head.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c88058e1e36830108202fc804df9571f2b1ca05807d3779865f59b1848c0e99
3
+ size 4297504
prior_attributer_polymarket_8b_1000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8e7fd43ead71bf040df2212a8b5cffc89a492d5db12a90bedbcf5b84247f901
3
+ size 14244
prior_attributer_polymarket_8b_1000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d63818d7e752a9a992a847df6da441f35abead963147e84fe44ce9663ee51347
3
+ size 1064
prior_attributer_polymarket_8b_1000/trainer_state.json ADDED
@@ -0,0 +1,901 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1000,
3
+ "best_metric": 0.13900727033615112,
4
+ "best_model_checkpoint": "../saves/prior_attributer_polymarket_8b/checkpoint-1000",
5
+ "epoch": 0.4079967360261118,
6
+ "eval_steps": 50,
7
+ "global_step": 1000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.0004079967360261118,
14
+ "grad_norm": 2.670715093612671,
15
+ "learning_rate": 1e-05,
16
+ "loss": 0.1994,
17
+ "step": 1
18
+ },
19
+ {
20
+ "epoch": 0.004079967360261118,
21
+ "grad_norm": 0.20084570348262787,
22
+ "learning_rate": 9.996328029375766e-06,
23
+ "loss": 0.2647,
24
+ "step": 10
25
+ },
26
+ {
27
+ "epoch": 0.008159934720522236,
28
+ "grad_norm": 3.061854839324951,
29
+ "learning_rate": 9.992248062015506e-06,
30
+ "loss": 0.296,
31
+ "step": 20
32
+ },
33
+ {
34
+ "epoch": 0.012239902080783354,
35
+ "grad_norm": 5.403031826019287,
36
+ "learning_rate": 9.988168094655244e-06,
37
+ "loss": 0.6343,
38
+ "step": 30
39
+ },
40
+ {
41
+ "epoch": 0.016319869441044473,
42
+ "grad_norm": 2.4748289585113525,
43
+ "learning_rate": 9.984088127294983e-06,
44
+ "loss": 0.2773,
45
+ "step": 40
46
+ },
47
+ {
48
+ "epoch": 0.02039983680130559,
49
+ "grad_norm": 1.1733546257019043,
50
+ "learning_rate": 9.980008159934721e-06,
51
+ "loss": 0.333,
52
+ "step": 50
53
+ },
54
+ {
55
+ "epoch": 0.02039983680130559,
56
+ "eval_loss": 0.16383115947246552,
57
+ "eval_runtime": 10354.4824,
58
+ "eval_samples_per_second": 0.357,
59
+ "eval_steps_per_second": 0.357,
60
+ "step": 50
61
+ },
62
+ {
63
+ "epoch": 0.02447980416156671,
64
+ "grad_norm": 3.983604907989502,
65
+ "learning_rate": 9.97592819257446e-06,
66
+ "loss": 0.4789,
67
+ "step": 60
68
+ },
69
+ {
70
+ "epoch": 0.028559771521827825,
71
+ "grad_norm": 1.3020315170288086,
72
+ "learning_rate": 9.971848225214198e-06,
73
+ "loss": 0.3092,
74
+ "step": 70
75
+ },
76
+ {
77
+ "epoch": 0.032639738882088945,
78
+ "grad_norm": 4.395025253295898,
79
+ "learning_rate": 9.967768257853938e-06,
80
+ "loss": 0.5212,
81
+ "step": 80
82
+ },
83
+ {
84
+ "epoch": 0.03671970624235006,
85
+ "grad_norm": 1.5565115213394165,
86
+ "learning_rate": 9.963688290493678e-06,
87
+ "loss": 0.3205,
88
+ "step": 90
89
+ },
90
+ {
91
+ "epoch": 0.04079967360261118,
92
+ "grad_norm": 1.5836162567138672,
93
+ "learning_rate": 9.959608323133415e-06,
94
+ "loss": 0.327,
95
+ "step": 100
96
+ },
97
+ {
98
+ "epoch": 0.04079967360261118,
99
+ "eval_loss": 0.16183292865753174,
100
+ "eval_runtime": 10355.8257,
101
+ "eval_samples_per_second": 0.357,
102
+ "eval_steps_per_second": 0.357,
103
+ "step": 100
104
+ },
105
+ {
106
+ "epoch": 0.044879640962872294,
107
+ "grad_norm": 2.0744669437408447,
108
+ "learning_rate": 9.955528355773153e-06,
109
+ "loss": 0.3464,
110
+ "step": 110
111
+ },
112
+ {
113
+ "epoch": 0.04895960832313342,
114
+ "grad_norm": 0.9152021408081055,
115
+ "learning_rate": 9.951448388412893e-06,
116
+ "loss": 0.2172,
117
+ "step": 120
118
+ },
119
+ {
120
+ "epoch": 0.053039575683394534,
121
+ "grad_norm": 3.3941807746887207,
122
+ "learning_rate": 9.947368421052632e-06,
123
+ "loss": 0.4466,
124
+ "step": 130
125
+ },
126
+ {
127
+ "epoch": 0.05711954304365565,
128
+ "grad_norm": 2.671168565750122,
129
+ "learning_rate": 9.943288453692372e-06,
130
+ "loss": 0.3001,
131
+ "step": 140
132
+ },
133
+ {
134
+ "epoch": 0.06119951040391677,
135
+ "grad_norm": 2.2553200721740723,
136
+ "learning_rate": 9.93920848633211e-06,
137
+ "loss": 0.2611,
138
+ "step": 150
139
+ },
140
+ {
141
+ "epoch": 0.06119951040391677,
142
+ "eval_loss": 0.16030286252498627,
143
+ "eval_runtime": 10355.0438,
144
+ "eval_samples_per_second": 0.357,
145
+ "eval_steps_per_second": 0.357,
146
+ "step": 150
147
+ },
148
+ {
149
+ "epoch": 0.06527947776417789,
150
+ "grad_norm": 1.6387931108474731,
151
+ "learning_rate": 9.93512851897185e-06,
152
+ "loss": 0.2925,
153
+ "step": 160
154
+ },
155
+ {
156
+ "epoch": 0.069359445124439,
157
+ "grad_norm": 2.564134120941162,
158
+ "learning_rate": 9.931048551611589e-06,
159
+ "loss": 0.3059,
160
+ "step": 170
161
+ },
162
+ {
163
+ "epoch": 0.07343941248470012,
164
+ "grad_norm": 2.3305749893188477,
165
+ "learning_rate": 9.926968584251327e-06,
166
+ "loss": 0.3646,
167
+ "step": 180
168
+ },
169
+ {
170
+ "epoch": 0.07751937984496124,
171
+ "grad_norm": 4.769280433654785,
172
+ "learning_rate": 9.922888616891065e-06,
173
+ "loss": 0.2801,
174
+ "step": 190
175
+ },
176
+ {
177
+ "epoch": 0.08159934720522236,
178
+ "grad_norm": 2.8833346366882324,
179
+ "learning_rate": 9.918808649530804e-06,
180
+ "loss": 0.3375,
181
+ "step": 200
182
+ },
183
+ {
184
+ "epoch": 0.08159934720522236,
185
+ "eval_loss": 0.15935960412025452,
186
+ "eval_runtime": 10358.2826,
187
+ "eval_samples_per_second": 0.356,
188
+ "eval_steps_per_second": 0.356,
189
+ "step": 200
190
+ },
191
+ {
192
+ "epoch": 0.08567931456548347,
193
+ "grad_norm": 7.480192184448242,
194
+ "learning_rate": 9.914728682170544e-06,
195
+ "loss": 0.502,
196
+ "step": 210
197
+ },
198
+ {
199
+ "epoch": 0.08975928192574459,
200
+ "grad_norm": 3.914731979370117,
201
+ "learning_rate": 9.910648714810282e-06,
202
+ "loss": 0.2675,
203
+ "step": 220
204
+ },
205
+ {
206
+ "epoch": 0.09383924928600572,
207
+ "grad_norm": 4.262155532836914,
208
+ "learning_rate": 9.906568747450021e-06,
209
+ "loss": 0.154,
210
+ "step": 230
211
+ },
212
+ {
213
+ "epoch": 0.09791921664626684,
214
+ "grad_norm": 0.0,
215
+ "learning_rate": 9.90248878008976e-06,
216
+ "loss": 0.3281,
217
+ "step": 240
218
+ },
219
+ {
220
+ "epoch": 0.10199918400652795,
221
+ "grad_norm": 0.0,
222
+ "learning_rate": 9.898408812729499e-06,
223
+ "loss": 0.1986,
224
+ "step": 250
225
+ },
226
+ {
227
+ "epoch": 0.10199918400652795,
228
+ "eval_loss": 0.15859133005142212,
229
+ "eval_runtime": 10363.3022,
230
+ "eval_samples_per_second": 0.356,
231
+ "eval_steps_per_second": 0.356,
232
+ "step": 250
233
+ },
234
+ {
235
+ "epoch": 0.10607915136678907,
236
+ "grad_norm": 3.5509583950042725,
237
+ "learning_rate": 9.894328845369238e-06,
238
+ "loss": 0.3974,
239
+ "step": 260
240
+ },
241
+ {
242
+ "epoch": 0.11015911872705018,
243
+ "grad_norm": 2.8577442169189453,
244
+ "learning_rate": 9.890248878008976e-06,
245
+ "loss": 0.2832,
246
+ "step": 270
247
+ },
248
+ {
249
+ "epoch": 0.1142390860873113,
250
+ "grad_norm": 1.8718701601028442,
251
+ "learning_rate": 9.886168910648716e-06,
252
+ "loss": 0.3789,
253
+ "step": 280
254
+ },
255
+ {
256
+ "epoch": 0.11831905344757242,
257
+ "grad_norm": 4.342713356018066,
258
+ "learning_rate": 9.882088943288455e-06,
259
+ "loss": 0.4543,
260
+ "step": 290
261
+ },
262
+ {
263
+ "epoch": 0.12239902080783353,
264
+ "grad_norm": 3.926931619644165,
265
+ "learning_rate": 9.878008975928193e-06,
266
+ "loss": 0.3815,
267
+ "step": 300
268
+ },
269
+ {
270
+ "epoch": 0.12239902080783353,
271
+ "eval_loss": 0.1574268788099289,
272
+ "eval_runtime": 10364.1942,
273
+ "eval_samples_per_second": 0.356,
274
+ "eval_steps_per_second": 0.356,
275
+ "step": 300
276
+ },
277
+ {
278
+ "epoch": 0.12647898816809466,
279
+ "grad_norm": 2.6151223182678223,
280
+ "learning_rate": 9.873929008567931e-06,
281
+ "loss": 0.182,
282
+ "step": 310
283
+ },
284
+ {
285
+ "epoch": 0.13055895552835578,
286
+ "grad_norm": 0.5143275260925293,
287
+ "learning_rate": 9.86984904120767e-06,
288
+ "loss": 0.2173,
289
+ "step": 320
290
+ },
291
+ {
292
+ "epoch": 0.1346389228886169,
293
+ "grad_norm": 0.8282421827316284,
294
+ "learning_rate": 9.86576907384741e-06,
295
+ "loss": 0.2572,
296
+ "step": 330
297
+ },
298
+ {
299
+ "epoch": 0.138718890248878,
300
+ "grad_norm": 0.7505475878715515,
301
+ "learning_rate": 9.861689106487148e-06,
302
+ "loss": 0.1832,
303
+ "step": 340
304
+ },
305
+ {
306
+ "epoch": 0.14279885760913913,
307
+ "grad_norm": 4.016334533691406,
308
+ "learning_rate": 9.857609139126888e-06,
309
+ "loss": 0.319,
310
+ "step": 350
311
+ },
312
+ {
313
+ "epoch": 0.14279885760913913,
314
+ "eval_loss": 0.15675334632396698,
315
+ "eval_runtime": 10364.3059,
316
+ "eval_samples_per_second": 0.356,
317
+ "eval_steps_per_second": 0.356,
318
+ "step": 350
319
+ },
320
+ {
321
+ "epoch": 0.14687882496940025,
322
+ "grad_norm": 4.5542120933532715,
323
+ "learning_rate": 9.853529171766627e-06,
324
+ "loss": 0.3617,
325
+ "step": 360
326
+ },
327
+ {
328
+ "epoch": 0.15095879232966136,
329
+ "grad_norm": 0.0,
330
+ "learning_rate": 9.849449204406367e-06,
331
+ "loss": 0.3207,
332
+ "step": 370
333
+ },
334
+ {
335
+ "epoch": 0.15503875968992248,
336
+ "grad_norm": 2.223388433456421,
337
+ "learning_rate": 9.845369237046105e-06,
338
+ "loss": 0.3283,
339
+ "step": 380
340
+ },
341
+ {
342
+ "epoch": 0.1591187270501836,
343
+ "grad_norm": 0.0,
344
+ "learning_rate": 9.841289269685843e-06,
345
+ "loss": 0.2566,
346
+ "step": 390
347
+ },
348
+ {
349
+ "epoch": 0.1631986944104447,
350
+ "grad_norm": 3.4263710975646973,
351
+ "learning_rate": 9.837209302325582e-06,
352
+ "loss": 0.4226,
353
+ "step": 400
354
+ },
355
+ {
356
+ "epoch": 0.1631986944104447,
357
+ "eval_loss": 0.15490344166755676,
358
+ "eval_runtime": 10364.3662,
359
+ "eval_samples_per_second": 0.356,
360
+ "eval_steps_per_second": 0.356,
361
+ "step": 400
362
+ },
363
+ {
364
+ "epoch": 0.16727866177070583,
365
+ "grad_norm": 2.0694384574890137,
366
+ "learning_rate": 9.833129334965322e-06,
367
+ "loss": 0.3677,
368
+ "step": 410
369
+ },
370
+ {
371
+ "epoch": 0.17135862913096694,
372
+ "grad_norm": 3.6313953399658203,
373
+ "learning_rate": 9.82904936760506e-06,
374
+ "loss": 0.3301,
375
+ "step": 420
376
+ },
377
+ {
378
+ "epoch": 0.17543859649122806,
379
+ "grad_norm": 0.0,
380
+ "learning_rate": 9.8249694002448e-06,
381
+ "loss": 0.2238,
382
+ "step": 430
383
+ },
384
+ {
385
+ "epoch": 0.17951856385148918,
386
+ "grad_norm": 2.6408114433288574,
387
+ "learning_rate": 9.820889432884537e-06,
388
+ "loss": 0.2209,
389
+ "step": 440
390
+ },
391
+ {
392
+ "epoch": 0.1835985312117503,
393
+ "grad_norm": 5.005034446716309,
394
+ "learning_rate": 9.816809465524277e-06,
395
+ "loss": 0.374,
396
+ "step": 450
397
+ },
398
+ {
399
+ "epoch": 0.1835985312117503,
400
+ "eval_loss": 0.15539048612117767,
401
+ "eval_runtime": 10364.9448,
402
+ "eval_samples_per_second": 0.356,
403
+ "eval_steps_per_second": 0.356,
404
+ "step": 450
405
+ },
406
+ {
407
+ "epoch": 0.18767849857201144,
408
+ "grad_norm": 0.0,
409
+ "learning_rate": 9.812729498164015e-06,
410
+ "loss": 0.2161,
411
+ "step": 460
412
+ },
413
+ {
414
+ "epoch": 0.19175846593227255,
415
+ "grad_norm": 2.4610049724578857,
416
+ "learning_rate": 9.808649530803754e-06,
417
+ "loss": 0.2318,
418
+ "step": 470
419
+ },
420
+ {
421
+ "epoch": 0.19583843329253367,
422
+ "grad_norm": 0.7451673150062561,
423
+ "learning_rate": 9.804569563443494e-06,
424
+ "loss": 0.2398,
425
+ "step": 480
426
+ },
427
+ {
428
+ "epoch": 0.1999184006527948,
429
+ "grad_norm": 1.8425447940826416,
430
+ "learning_rate": 9.800489596083233e-06,
431
+ "loss": 0.3167,
432
+ "step": 490
433
+ },
434
+ {
435
+ "epoch": 0.2039983680130559,
436
+ "grad_norm": 2.5845742225646973,
437
+ "learning_rate": 9.796409628722971e-06,
438
+ "loss": 0.162,
439
+ "step": 500
440
+ },
441
+ {
442
+ "epoch": 0.2039983680130559,
443
+ "eval_loss": 0.15188691020011902,
444
+ "eval_runtime": 10364.8411,
445
+ "eval_samples_per_second": 0.356,
446
+ "eval_steps_per_second": 0.356,
447
+ "step": 500
448
+ },
449
+ {
450
+ "epoch": 0.20807833537331702,
451
+ "grad_norm": 0.0,
452
+ "learning_rate": 9.792329661362709e-06,
453
+ "loss": 0.1781,
454
+ "step": 510
455
+ },
456
+ {
457
+ "epoch": 0.21215830273357814,
458
+ "grad_norm": 2.0829172134399414,
459
+ "learning_rate": 9.788249694002449e-06,
460
+ "loss": 0.2972,
461
+ "step": 520
462
+ },
463
+ {
464
+ "epoch": 0.21623827009383925,
465
+ "grad_norm": 2.929964303970337,
466
+ "learning_rate": 9.784169726642188e-06,
467
+ "loss": 0.2898,
468
+ "step": 530
469
+ },
470
+ {
471
+ "epoch": 0.22031823745410037,
472
+ "grad_norm": 6.498988151550293,
473
+ "learning_rate": 9.780089759281926e-06,
474
+ "loss": 0.2914,
475
+ "step": 540
476
+ },
477
+ {
478
+ "epoch": 0.22439820481436148,
479
+ "grad_norm": 3.3654491901397705,
480
+ "learning_rate": 9.776009791921666e-06,
481
+ "loss": 0.2302,
482
+ "step": 550
483
+ },
484
+ {
485
+ "epoch": 0.22439820481436148,
486
+ "eval_loss": 0.15363305807113647,
487
+ "eval_runtime": 10364.5889,
488
+ "eval_samples_per_second": 0.356,
489
+ "eval_steps_per_second": 0.356,
490
+ "step": 550
491
+ },
492
+ {
493
+ "epoch": 0.2284781721746226,
494
+ "grad_norm": 1.2231923341751099,
495
+ "learning_rate": 9.771929824561405e-06,
496
+ "loss": 0.4205,
497
+ "step": 560
498
+ },
499
+ {
500
+ "epoch": 0.23255813953488372,
501
+ "grad_norm": 4.62269926071167,
502
+ "learning_rate": 9.767849857201143e-06,
503
+ "loss": 0.3172,
504
+ "step": 570
505
+ },
506
+ {
507
+ "epoch": 0.23663810689514483,
508
+ "grad_norm": 2.85361909866333,
509
+ "learning_rate": 9.763769889840881e-06,
510
+ "loss": 0.3243,
511
+ "step": 580
512
+ },
513
+ {
514
+ "epoch": 0.24071807425540595,
515
+ "grad_norm": 3.5462374687194824,
516
+ "learning_rate": 9.75968992248062e-06,
517
+ "loss": 0.3596,
518
+ "step": 590
519
+ },
520
+ {
521
+ "epoch": 0.24479804161566707,
522
+ "grad_norm": 2.7605552673339844,
523
+ "learning_rate": 9.75560995512036e-06,
524
+ "loss": 0.2076,
525
+ "step": 600
526
+ },
527
+ {
528
+ "epoch": 0.24479804161566707,
529
+ "eval_loss": 0.15099598467350006,
530
+ "eval_runtime": 10364.6206,
531
+ "eval_samples_per_second": 0.356,
532
+ "eval_steps_per_second": 0.356,
533
+ "step": 600
534
+ },
535
+ {
536
+ "epoch": 0.24887800897592818,
537
+ "grad_norm": 2.784391403198242,
538
+ "learning_rate": 9.7515299877601e-06,
539
+ "loss": 0.3562,
540
+ "step": 610
541
+ },
542
+ {
543
+ "epoch": 0.2529579763361893,
544
+ "grad_norm": 2.931553363800049,
545
+ "learning_rate": 9.747450020399837e-06,
546
+ "loss": 0.3455,
547
+ "step": 620
548
+ },
549
+ {
550
+ "epoch": 0.25703794369645044,
551
+ "grad_norm": 1.5483777523040771,
552
+ "learning_rate": 9.743370053039577e-06,
553
+ "loss": 0.3134,
554
+ "step": 630
555
+ },
556
+ {
557
+ "epoch": 0.26111791105671156,
558
+ "grad_norm": 2.5101840496063232,
559
+ "learning_rate": 9.739290085679315e-06,
560
+ "loss": 0.2218,
561
+ "step": 640
562
+ },
563
+ {
564
+ "epoch": 0.2651978784169727,
565
+ "grad_norm": 0.6121233701705933,
566
+ "learning_rate": 9.735210118319054e-06,
567
+ "loss": 0.179,
568
+ "step": 650
569
+ },
570
+ {
571
+ "epoch": 0.2651978784169727,
572
+ "eval_loss": 0.15009666979312897,
573
+ "eval_runtime": 10364.7228,
574
+ "eval_samples_per_second": 0.356,
575
+ "eval_steps_per_second": 0.356,
576
+ "step": 650
577
+ },
578
+ {
579
+ "epoch": 0.2692778457772338,
580
+ "grad_norm": 4.591267108917236,
581
+ "learning_rate": 9.731130150958792e-06,
582
+ "loss": 0.4943,
583
+ "step": 660
584
+ },
585
+ {
586
+ "epoch": 0.2733578131374949,
587
+ "grad_norm": 2.8425586223602295,
588
+ "learning_rate": 9.727050183598532e-06,
589
+ "loss": 0.1848,
590
+ "step": 670
591
+ },
592
+ {
593
+ "epoch": 0.277437780497756,
594
+ "grad_norm": 2.972445249557495,
595
+ "learning_rate": 9.722970216238272e-06,
596
+ "loss": 0.1622,
597
+ "step": 680
598
+ },
599
+ {
600
+ "epoch": 0.28151774785801714,
601
+ "grad_norm": 3.4708287715911865,
602
+ "learning_rate": 9.71889024887801e-06,
603
+ "loss": 0.1518,
604
+ "step": 690
605
+ },
606
+ {
607
+ "epoch": 0.28559771521827826,
608
+ "grad_norm": 1.8176753520965576,
609
+ "learning_rate": 9.714810281517749e-06,
610
+ "loss": 0.2419,
611
+ "step": 700
612
+ },
613
+ {
614
+ "epoch": 0.28559771521827826,
615
+ "eval_loss": 0.14927682280540466,
616
+ "eval_runtime": 10365.2728,
617
+ "eval_samples_per_second": 0.356,
618
+ "eval_steps_per_second": 0.356,
619
+ "step": 700
620
+ },
621
+ {
622
+ "epoch": 0.2896776825785394,
623
+ "grad_norm": 2.8468093872070312,
624
+ "learning_rate": 9.710730314157487e-06,
625
+ "loss": 0.2811,
626
+ "step": 710
627
+ },
628
+ {
629
+ "epoch": 0.2937576499388005,
630
+ "grad_norm": 1.6590688228607178,
631
+ "learning_rate": 9.706650346797226e-06,
632
+ "loss": 0.1726,
633
+ "step": 720
634
+ },
635
+ {
636
+ "epoch": 0.2978376172990616,
637
+ "grad_norm": 1.9409711360931396,
638
+ "learning_rate": 9.702570379436966e-06,
639
+ "loss": 0.3225,
640
+ "step": 730
641
+ },
642
+ {
643
+ "epoch": 0.3019175846593227,
644
+ "grad_norm": 3.7657744884490967,
645
+ "learning_rate": 9.698490412076704e-06,
646
+ "loss": 0.3108,
647
+ "step": 740
648
+ },
649
+ {
650
+ "epoch": 0.30599755201958384,
651
+ "grad_norm": 4.228935718536377,
652
+ "learning_rate": 9.694410444716443e-06,
653
+ "loss": 0.536,
654
+ "step": 750
655
+ },
656
+ {
657
+ "epoch": 0.30599755201958384,
658
+ "eval_loss": 0.14842770993709564,
659
+ "eval_runtime": 10364.1675,
660
+ "eval_samples_per_second": 0.356,
661
+ "eval_steps_per_second": 0.356,
662
+ "step": 750
663
+ },
664
+ {
665
+ "epoch": 0.31007751937984496,
666
+ "grad_norm": 2.4524126052856445,
667
+ "learning_rate": 9.690330477356183e-06,
668
+ "loss": 0.2715,
669
+ "step": 760
670
+ },
671
+ {
672
+ "epoch": 0.3141574867401061,
673
+ "grad_norm": 3.1010375022888184,
674
+ "learning_rate": 9.686250509995921e-06,
675
+ "loss": 0.2502,
676
+ "step": 770
677
+ },
678
+ {
679
+ "epoch": 0.3182374541003672,
680
+ "grad_norm": 0.4105437099933624,
681
+ "learning_rate": 9.682170542635659e-06,
682
+ "loss": 0.2992,
683
+ "step": 780
684
+ },
685
+ {
686
+ "epoch": 0.3223174214606283,
687
+ "grad_norm": 3.3335089683532715,
688
+ "learning_rate": 9.678090575275398e-06,
689
+ "loss": 0.2272,
690
+ "step": 790
691
+ },
692
+ {
693
+ "epoch": 0.3263973888208894,
694
+ "grad_norm": 3.66196870803833,
695
+ "learning_rate": 9.674010607915138e-06,
696
+ "loss": 0.1823,
697
+ "step": 800
698
+ },
699
+ {
700
+ "epoch": 0.3263973888208894,
701
+ "eval_loss": 0.1460103839635849,
702
+ "eval_runtime": 10364.4289,
703
+ "eval_samples_per_second": 0.356,
704
+ "eval_steps_per_second": 0.356,
705
+ "step": 800
706
+ },
707
+ {
708
+ "epoch": 0.33047735618115054,
709
+ "grad_norm": 0.0,
710
+ "learning_rate": 9.669930640554876e-06,
711
+ "loss": 0.24,
712
+ "step": 810
713
+ },
714
+ {
715
+ "epoch": 0.33455732354141166,
716
+ "grad_norm": 2.9362757205963135,
717
+ "learning_rate": 9.665850673194615e-06,
718
+ "loss": 0.2955,
719
+ "step": 820
720
+ },
721
+ {
722
+ "epoch": 0.33863729090167277,
723
+ "grad_norm": 1.2982591390609741,
724
+ "learning_rate": 9.661770705834355e-06,
725
+ "loss": 0.1341,
726
+ "step": 830
727
+ },
728
+ {
729
+ "epoch": 0.3427172582619339,
730
+ "grad_norm": 1.246749758720398,
731
+ "learning_rate": 9.657690738474093e-06,
732
+ "loss": 0.3102,
733
+ "step": 840
734
+ },
735
+ {
736
+ "epoch": 0.346797225622195,
737
+ "grad_norm": 2.056279182434082,
738
+ "learning_rate": 9.653610771113832e-06,
739
+ "loss": 0.29,
740
+ "step": 850
741
+ },
742
+ {
743
+ "epoch": 0.346797225622195,
744
+ "eval_loss": 0.1432345062494278,
745
+ "eval_runtime": 10366.166,
746
+ "eval_samples_per_second": 0.356,
747
+ "eval_steps_per_second": 0.356,
748
+ "step": 850
749
+ },
750
+ {
751
+ "epoch": 0.3508771929824561,
752
+ "grad_norm": 1.6712030172348022,
753
+ "learning_rate": 9.64953080375357e-06,
754
+ "loss": 0.146,
755
+ "step": 860
756
+ },
757
+ {
758
+ "epoch": 0.35495716034271724,
759
+ "grad_norm": 1.370171308517456,
760
+ "learning_rate": 9.64545083639331e-06,
761
+ "loss": 0.1812,
762
+ "step": 870
763
+ },
764
+ {
765
+ "epoch": 0.35903712770297835,
766
+ "grad_norm": 1.83916437625885,
767
+ "learning_rate": 9.64137086903305e-06,
768
+ "loss": 0.3776,
769
+ "step": 880
770
+ },
771
+ {
772
+ "epoch": 0.36311709506323947,
773
+ "grad_norm": 2.024589776992798,
774
+ "learning_rate": 9.637290901672787e-06,
775
+ "loss": 0.2238,
776
+ "step": 890
777
+ },
778
+ {
779
+ "epoch": 0.3671970624235006,
780
+ "grad_norm": 3.2594895362854004,
781
+ "learning_rate": 9.633210934312525e-06,
782
+ "loss": 0.3453,
783
+ "step": 900
784
+ },
785
+ {
786
+ "epoch": 0.3671970624235006,
787
+ "eval_loss": 0.14180521667003632,
788
+ "eval_runtime": 10366.0282,
789
+ "eval_samples_per_second": 0.356,
790
+ "eval_steps_per_second": 0.356,
791
+ "step": 900
792
+ },
793
+ {
794
+ "epoch": 0.3712770297837617,
795
+ "grad_norm": 2.505995512008667,
796
+ "learning_rate": 9.629130966952265e-06,
797
+ "loss": 0.3888,
798
+ "step": 910
799
+ },
800
+ {
801
+ "epoch": 0.3753569971440229,
802
+ "grad_norm": 1.5990264415740967,
803
+ "learning_rate": 9.625050999592004e-06,
804
+ "loss": 0.2187,
805
+ "step": 920
806
+ },
807
+ {
808
+ "epoch": 0.379436964504284,
809
+ "grad_norm": 0.7377971410751343,
810
+ "learning_rate": 9.620971032231742e-06,
811
+ "loss": 0.2008,
812
+ "step": 930
813
+ },
814
+ {
815
+ "epoch": 0.3835169318645451,
816
+ "grad_norm": 1.8711506128311157,
817
+ "learning_rate": 9.616891064871482e-06,
818
+ "loss": 0.1518,
819
+ "step": 940
820
+ },
821
+ {
822
+ "epoch": 0.3875968992248062,
823
+ "grad_norm": 0.0,
824
+ "learning_rate": 9.612811097511221e-06,
825
+ "loss": 0.1361,
826
+ "step": 950
827
+ },
828
+ {
829
+ "epoch": 0.3875968992248062,
830
+ "eval_loss": 0.1408400684595108,
831
+ "eval_runtime": 10364.8084,
832
+ "eval_samples_per_second": 0.356,
833
+ "eval_steps_per_second": 0.356,
834
+ "step": 950
835
+ },
836
+ {
837
+ "epoch": 0.39167686658506734,
838
+ "grad_norm": 0.6772951483726501,
839
+ "learning_rate": 9.60873113015096e-06,
840
+ "loss": 0.1837,
841
+ "step": 960
842
+ },
843
+ {
844
+ "epoch": 0.39575683394532846,
845
+ "grad_norm": 3.2690086364746094,
846
+ "learning_rate": 9.604651162790699e-06,
847
+ "loss": 0.2869,
848
+ "step": 970
849
+ },
850
+ {
851
+ "epoch": 0.3998368013055896,
852
+ "grad_norm": 2.1059823036193848,
853
+ "learning_rate": 9.600571195430437e-06,
854
+ "loss": 0.2649,
855
+ "step": 980
856
+ },
857
+ {
858
+ "epoch": 0.4039167686658507,
859
+ "grad_norm": 2.562466621398926,
860
+ "learning_rate": 9.596491228070176e-06,
861
+ "loss": 0.4868,
862
+ "step": 990
863
+ },
864
+ {
865
+ "epoch": 0.4079967360261118,
866
+ "grad_norm": 2.7138028144836426,
867
+ "learning_rate": 9.592411260709916e-06,
868
+ "loss": 0.3554,
869
+ "step": 1000
870
+ },
871
+ {
872
+ "epoch": 0.4079967360261118,
873
+ "eval_loss": 0.13900727033615112,
874
+ "eval_runtime": 10365.6047,
875
+ "eval_samples_per_second": 0.356,
876
+ "eval_steps_per_second": 0.356,
877
+ "step": 1000
878
+ }
879
+ ],
880
+ "logging_steps": 10,
881
+ "max_steps": 24510,
882
+ "num_input_tokens_seen": 0,
883
+ "num_train_epochs": 10,
884
+ "save_steps": 50,
885
+ "stateful_callbacks": {
886
+ "TrainerControl": {
887
+ "args": {
888
+ "should_epoch_stop": false,
889
+ "should_evaluate": false,
890
+ "should_log": false,
891
+ "should_save": true,
892
+ "should_training_stop": false
893
+ },
894
+ "attributes": {}
895
+ }
896
+ },
897
+ "total_flos": 1.0161226838179704e+18,
898
+ "train_batch_size": 2,
899
+ "trial_name": null,
900
+ "trial_params": null
901
+ }
prior_attributer_polymarket_8b_1000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91c1a54958c21e348ca3e3a11aac86e1a111ebee7689aee1b3771ec407aa917f
3
+ size 5496