Wb-az commited on
Commit
379d0d5
·
verified ·
1 Parent(s): 9e5252b

End of training

Browse files
README.md CHANGED
@@ -24,12 +24,12 @@ should probably proofread and complete it, then remove this comment. -->
24
 
25
  This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on an unknown dataset.
26
  It achieves the following results on the evaluation set:
27
- - Loss: 0.0970
28
- - Accuracy: 0.9737
29
- - Matthews Correlation: 0.9651
30
- - F1: 0.9601
31
- - Precision: 0.9545
32
- - Recall: 0.9661
33
 
34
  ## Model description
35
 
@@ -62,9 +62,11 @@ The following hyperparameters were used during training:
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | Matthews Correlation | F1 | Precision | Recall |
64
  |:-------------:|:------:|:----:|:---------------:|:--------:|:--------------------:|:------:|:---------:|:------:|
65
- | 1.0817 | 0.2501 | 1771 | 0.1588 | 0.9537 | 0.9386 | 0.9368 | 0.9294 | 0.9447 |
66
- | 0.6413 | 0.5002 | 3542 | 0.1110 | 0.9686 | 0.9583 | 0.9538 | 0.9471 | 0.9610 |
67
- | 0.5993 | 0.7503 | 5313 | 0.0970 | 0.9737 | 0.9651 | 0.9601 | 0.9545 | 0.9661 |
 
 
68
 
69
 
70
  ### Framework versions
 
24
 
25
  This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on an unknown dataset.
26
  It achieves the following results on the evaluation set:
27
+ - Loss: 0.0933
28
+ - Accuracy: 0.9745
29
+ - Matthews Correlation: 0.9662
30
+ - F1: 0.9609
31
+ - Precision: 0.9550
32
+ - Recall: 0.9671
33
 
34
  ## Model description
35
 
 
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | Matthews Correlation | F1 | Precision | Recall |
64
  |:-------------:|:------:|:----:|:---------------:|:--------:|:--------------------:|:------:|:---------:|:------:|
65
+ | 1.1892 | 0.1977 | 1400 | 0.2318 | 0.9216 | 0.8977 | 0.8968 | 0.8726 | 0.9270 |
66
+ | 0.6786 | 0.3954 | 2800 | 0.1395 | 0.9621 | 0.9497 | 0.9438 | 0.9332 | 0.9567 |
67
+ | 0.6029 | 0.5931 | 4200 | 0.1098 | 0.9696 | 0.9597 | 0.9558 | 0.9491 | 0.9629 |
68
+ | 0.5632 | 0.7908 | 5600 | 0.0951 | 0.9742 | 0.9658 | 0.9602 | 0.9539 | 0.9672 |
69
+ | 0.5216 | 0.9885 | 7000 | 0.0933 | 0.9745 | 0.9662 | 0.9609 | 0.9550 | 0.9671 |
70
 
71
 
72
  ### Framework versions
adapter_config.json CHANGED
@@ -32,8 +32,8 @@
32
  "rank_pattern": {},
33
  "revision": null,
34
  "target_modules": [
35
- "query",
36
- "value"
37
  ],
38
  "target_parameters": null,
39
  "task_type": "SEQ_CLS",
 
32
  "rank_pattern": {},
33
  "revision": null,
34
  "target_modules": [
35
+ "value",
36
+ "query"
37
  ],
38
  "target_parameters": null,
39
  "task_type": "SEQ_CLS",
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:39e87ad2a54e746c71d8fc41674a6f505bbaa4a7c421d1ab051407641d9a45ba
3
- size 3567816
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9359dd3196e697aaa1462899c8b053116cd386beadb946a942c0d87af2eaef9d
3
+ size 3567808
archive/checkpoint-7000/README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: roberta-base
3
+ library_name: peft
4
+ tags:
5
+ - base_model:adapter:roberta-base
6
+ - lora
7
+ - transformers
8
+ ---
9
+
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+
24
+ - **Developed by:** [More Information Needed]
25
+ - **Funded by [optional]:** [More Information Needed]
26
+ - **Shared by [optional]:** [More Information Needed]
27
+ - **Model type:** [More Information Needed]
28
+ - **Language(s) (NLP):** [More Information Needed]
29
+ - **License:** [More Information Needed]
30
+ - **Finetuned from model [optional]:** [More Information Needed]
31
+
32
+ ### Model Sources [optional]
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+
36
+ - **Repository:** [More Information Needed]
37
+ - **Paper [optional]:** [More Information Needed]
38
+ - **Demo [optional]:** [More Information Needed]
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ ### Direct Use
45
+
46
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
+
48
+ [More Information Needed]
49
+
50
+ ### Downstream Use [optional]
51
+
52
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
+
54
+ [More Information Needed]
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ [More Information Needed]
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ [More Information Needed]
67
+
68
+ ### Recommendations
69
+
70
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
71
+
72
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
73
+
74
+ ## How to Get Started with the Model
75
+
76
+ Use the code below to get started with the model.
77
+
78
+ [More Information Needed]
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
+
86
+ [More Information Needed]
87
+
88
+ ### Training Procedure
89
+
90
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
91
+
92
+ #### Preprocessing [optional]
93
+
94
+ [More Information Needed]
95
+
96
+
97
+ #### Training Hyperparameters
98
+
99
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
+
101
+ #### Speeds, Sizes, Times [optional]
102
+
103
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
+
105
+ [More Information Needed]
106
+
107
+ ## Evaluation
108
+
109
+ <!-- This section describes the evaluation protocols and provides the results. -->
110
+
111
+ ### Testing Data, Factors & Metrics
112
+
113
+ #### Testing Data
114
+
115
+ <!-- This should link to a Dataset Card if possible. -->
116
+
117
+ [More Information Needed]
118
+
119
+ #### Factors
120
+
121
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
122
+
123
+ [More Information Needed]
124
+
125
+ #### Metrics
126
+
127
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
128
+
129
+ [More Information Needed]
130
+
131
+ ### Results
132
+
133
+ [More Information Needed]
134
+
135
+ #### Summary
136
+
137
+
138
+
139
+ ## Model Examination [optional]
140
+
141
+ <!-- Relevant interpretability work for the model goes here -->
142
+
143
+ [More Information Needed]
144
+
145
+ ## Environmental Impact
146
+
147
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
148
+
149
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
150
+
151
+ - **Hardware Type:** [More Information Needed]
152
+ - **Hours used:** [More Information Needed]
153
+ - **Cloud Provider:** [More Information Needed]
154
+ - **Compute Region:** [More Information Needed]
155
+ - **Carbon Emitted:** [More Information Needed]
156
+
157
+ ## Technical Specifications [optional]
158
+
159
+ ### Model Architecture and Objective
160
+
161
+ [More Information Needed]
162
+
163
+ ### Compute Infrastructure
164
+
165
+ [More Information Needed]
166
+
167
+ #### Hardware
168
+
169
+ [More Information Needed]
170
+
171
+ #### Software
172
+
173
+ [More Information Needed]
174
+
175
+ ## Citation [optional]
176
+
177
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
178
+
179
+ **BibTeX:**
180
+
181
+ [More Information Needed]
182
+
183
+ **APA:**
184
+
185
+ [More Information Needed]
186
+
187
+ ## Glossary [optional]
188
+
189
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
190
+
191
+ [More Information Needed]
192
+
193
+ ## More Information [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Authors [optional]
198
+
199
+ [More Information Needed]
200
+
201
+ ## Model Card Contact
202
+
203
+ [More Information Needed]
204
+ ### Framework versions
205
+
206
+ - PEFT 0.18.1
archive/checkpoint-7000/adapter_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "roberta-base",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "classifier",
26
+ "score"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 8,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "query",
36
+ "value"
37
+ ],
38
+ "target_parameters": null,
39
+ "task_type": "SEQ_CLS",
40
+ "trainable_token_indices": null,
41
+ "use_dora": false,
42
+ "use_qalora": false,
43
+ "use_rslora": false
44
+ }
archive/checkpoint-7000/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d193c99c5437e82cc685ac0501c6b1701a78252e25b416b3dc5f6b9373e9a51
3
+ size 3567808
archive/checkpoint-7000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d11701fd091aff39a4a22c8a2ba2537b27d49aced3fb902cdd1a14e9a70dd5af
3
+ size 7166091
archive/checkpoint-7000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40591c654a1749ae2d278079248a253ecb221712644cef06e525077cbf89b51c
3
+ size 14645
archive/checkpoint-7000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e0a2a2368398f5f5758378f26faf478e2d2bd0bd6690e75727f569fe9cdc7ae
3
+ size 1465
archive/checkpoint-7000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
archive/checkpoint-7000/tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<s>",
5
+ "cls_token": "<s>",
6
+ "eos_token": "</s>",
7
+ "errors": "replace",
8
+ "is_local": false,
9
+ "mask_token": "<mask>",
10
+ "model_max_length": 512,
11
+ "pad_token": "<pad>",
12
+ "sep_token": "</s>",
13
+ "tokenizer_class": "RobertaTokenizer",
14
+ "trim_offsets": true,
15
+ "unk_token": "<unk>"
16
+ }
archive/checkpoint-7000/trainer_state.json ADDED
@@ -0,0 +1,589 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 7000,
3
+ "best_metric": 0.9613335088674932,
4
+ "best_model_checkpoint": "results/weights/q8intlora/peft-roberta-base/checkpoint-7000",
5
+ "epoch": 0.9885260370697264,
6
+ "eval_steps": 1400,
7
+ "global_step": 7000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.01412180052956752,
14
+ "grad_norm": 3.725158929824829,
15
+ "learning_rate": 9.86020898051398e-05,
16
+ "loss": 7.050463256835937,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.02824360105913504,
21
+ "grad_norm": 4.400176048278809,
22
+ "learning_rate": 9.7190059305281e-05,
23
+ "loss": 4.082845153808594,
24
+ "step": 200
25
+ },
26
+ {
27
+ "epoch": 0.04236540158870256,
28
+ "grad_norm": 3.011744499206543,
29
+ "learning_rate": 9.57780288054222e-05,
30
+ "loss": 2.8129367065429687,
31
+ "step": 300
32
+ },
33
+ {
34
+ "epoch": 0.05648720211827008,
35
+ "grad_norm": 3.1704041957855225,
36
+ "learning_rate": 9.436599830556341e-05,
37
+ "loss": 2.3832768249511718,
38
+ "step": 400
39
+ },
40
+ {
41
+ "epoch": 0.0706090026478376,
42
+ "grad_norm": 4.340809345245361,
43
+ "learning_rate": 9.295396780570461e-05,
44
+ "loss": 2.0935064697265626,
45
+ "step": 500
46
+ },
47
+ {
48
+ "epoch": 0.08473080317740513,
49
+ "grad_norm": 2.461683750152588,
50
+ "learning_rate": 9.154193730584581e-05,
51
+ "loss": 1.8044235229492187,
52
+ "step": 600
53
+ },
54
+ {
55
+ "epoch": 0.09885260370697264,
56
+ "grad_norm": 4.021312713623047,
57
+ "learning_rate": 9.012990680598702e-05,
58
+ "loss": 1.7957286071777343,
59
+ "step": 700
60
+ },
61
+ {
62
+ "epoch": 0.11297440423654016,
63
+ "grad_norm": 3.464273452758789,
64
+ "learning_rate": 8.871787630612822e-05,
65
+ "loss": 1.6542176818847656,
66
+ "step": 800
67
+ },
68
+ {
69
+ "epoch": 0.12709620476610767,
70
+ "grad_norm": 3.0971059799194336,
71
+ "learning_rate": 8.730584580626942e-05,
72
+ "loss": 1.501783905029297,
73
+ "step": 900
74
+ },
75
+ {
76
+ "epoch": 0.1412180052956752,
77
+ "grad_norm": 4.295173168182373,
78
+ "learning_rate": 8.589381530641061e-05,
79
+ "loss": 1.3875682067871093,
80
+ "step": 1000
81
+ },
82
+ {
83
+ "epoch": 0.1553398058252427,
84
+ "grad_norm": 2.0098490715026855,
85
+ "learning_rate": 8.448178480655182e-05,
86
+ "loss": 1.4273907470703124,
87
+ "step": 1100
88
+ },
89
+ {
90
+ "epoch": 0.16946160635481025,
91
+ "grad_norm": 3.7033345699310303,
92
+ "learning_rate": 8.306975430669302e-05,
93
+ "loss": 1.2835659790039062,
94
+ "step": 1200
95
+ },
96
+ {
97
+ "epoch": 0.18358340688437777,
98
+ "grad_norm": 3.607093095779419,
99
+ "learning_rate": 8.165772380683423e-05,
100
+ "loss": 1.172675552368164,
101
+ "step": 1300
102
+ },
103
+ {
104
+ "epoch": 0.1977052074139453,
105
+ "grad_norm": 4.655322551727295,
106
+ "learning_rate": 8.024569330697543e-05,
107
+ "loss": 1.1659706878662108,
108
+ "step": 1400
109
+ },
110
+ {
111
+ "epoch": 0.1977052074139453,
112
+ "eval_accuracy": 0.929467458759879,
113
+ "eval_f1": 0.9067117960974359,
114
+ "eval_loss": 0.21753081679344177,
115
+ "eval_matthews_correlation": 0.907413923387255,
116
+ "eval_precision": 0.8868148616081014,
117
+ "eval_recall": 0.9307027856599929,
118
+ "eval_runtime": 302.3947,
119
+ "eval_samples_per_second": 249.383,
120
+ "eval_steps_per_second": 31.174,
121
+ "step": 1400
122
+ },
123
+ {
124
+ "epoch": 0.2118270079435128,
125
+ "grad_norm": 2.124321460723877,
126
+ "learning_rate": 7.883366280711664e-05,
127
+ "loss": 1.2192025756835938,
128
+ "step": 1500
129
+ },
130
+ {
131
+ "epoch": 0.22594880847308033,
132
+ "grad_norm": 2.0321645736694336,
133
+ "learning_rate": 7.742163230725784e-05,
134
+ "loss": 1.1063846588134765,
135
+ "step": 1600
136
+ },
137
+ {
138
+ "epoch": 0.24007060900264784,
139
+ "grad_norm": 7.473122596740723,
140
+ "learning_rate": 7.600960180739905e-05,
141
+ "loss": 1.1167493438720704,
142
+ "step": 1700
143
+ },
144
+ {
145
+ "epoch": 0.25419240953221534,
146
+ "grad_norm": 1.6879299879074097,
147
+ "learning_rate": 7.459757130754025e-05,
148
+ "loss": 1.0563026428222657,
149
+ "step": 1800
150
+ },
151
+ {
152
+ "epoch": 0.26831421006178285,
153
+ "grad_norm": 5.7779130935668945,
154
+ "learning_rate": 7.318554080768146e-05,
155
+ "loss": 0.9488992309570312,
156
+ "step": 1900
157
+ },
158
+ {
159
+ "epoch": 0.2824360105913504,
160
+ "grad_norm": 1.244494080543518,
161
+ "learning_rate": 7.177351030782266e-05,
162
+ "loss": 0.8847799682617188,
163
+ "step": 2000
164
+ },
165
+ {
166
+ "epoch": 0.2965578111209179,
167
+ "grad_norm": 4.008191108703613,
168
+ "learning_rate": 7.036147980796385e-05,
169
+ "loss": 0.9060035705566406,
170
+ "step": 2100
171
+ },
172
+ {
173
+ "epoch": 0.3106796116504854,
174
+ "grad_norm": 2.906226396560669,
175
+ "learning_rate": 6.894944930810506e-05,
176
+ "loss": 0.7914878082275391,
177
+ "step": 2200
178
+ },
179
+ {
180
+ "epoch": 0.324801412180053,
181
+ "grad_norm": 1.6399933099746704,
182
+ "learning_rate": 6.753741880824626e-05,
183
+ "loss": 0.7373094177246093,
184
+ "step": 2300
185
+ },
186
+ {
187
+ "epoch": 0.3389232127096205,
188
+ "grad_norm": 3.8636691570281982,
189
+ "learning_rate": 6.612538830838746e-05,
190
+ "loss": 0.7353231048583985,
191
+ "step": 2400
192
+ },
193
+ {
194
+ "epoch": 0.353045013239188,
195
+ "grad_norm": 4.8219194412231445,
196
+ "learning_rate": 6.471335780852866e-05,
197
+ "loss": 0.8310698699951172,
198
+ "step": 2500
199
+ },
200
+ {
201
+ "epoch": 0.36716681376875554,
202
+ "grad_norm": 1.61204993724823,
203
+ "learning_rate": 6.330132730866987e-05,
204
+ "loss": 0.7148534393310547,
205
+ "step": 2600
206
+ },
207
+ {
208
+ "epoch": 0.38128861429832306,
209
+ "grad_norm": 4.425326824188232,
210
+ "learning_rate": 6.188929680881107e-05,
211
+ "loss": 0.8226445770263672,
212
+ "step": 2700
213
+ },
214
+ {
215
+ "epoch": 0.3954104148278906,
216
+ "grad_norm": 3.553007125854492,
217
+ "learning_rate": 6.047726630895228e-05,
218
+ "loss": 0.6587068939208984,
219
+ "step": 2800
220
+ },
221
+ {
222
+ "epoch": 0.3954104148278906,
223
+ "eval_accuracy": 0.9599002811223678,
224
+ "eval_f1": 0.9412536039168936,
225
+ "eval_loss": 0.14896713197231293,
226
+ "eval_matthews_correlation": 0.9470253620243468,
227
+ "eval_precision": 0.9284857025679184,
228
+ "eval_recall": 0.9570486713200069,
229
+ "eval_runtime": 299.6915,
230
+ "eval_samples_per_second": 251.632,
231
+ "eval_steps_per_second": 31.456,
232
+ "step": 2800
233
+ },
234
+ {
235
+ "epoch": 0.4095322153574581,
236
+ "grad_norm": 1.7863709926605225,
237
+ "learning_rate": 5.9065235809093475e-05,
238
+ "loss": 0.7582288360595704,
239
+ "step": 2900
240
+ },
241
+ {
242
+ "epoch": 0.4236540158870256,
243
+ "grad_norm": 2.5715460777282715,
244
+ "learning_rate": 5.765320530923468e-05,
245
+ "loss": 0.7553373718261719,
246
+ "step": 3000
247
+ },
248
+ {
249
+ "epoch": 0.43777581641659313,
250
+ "grad_norm": 9.767241477966309,
251
+ "learning_rate": 5.6241174809375883e-05,
252
+ "loss": 0.653603515625,
253
+ "step": 3100
254
+ },
255
+ {
256
+ "epoch": 0.45189761694616065,
257
+ "grad_norm": 3.407860279083252,
258
+ "learning_rate": 5.482914430951709e-05,
259
+ "loss": 0.6524411010742187,
260
+ "step": 3200
261
+ },
262
+ {
263
+ "epoch": 0.46601941747572817,
264
+ "grad_norm": 2.109328031539917,
265
+ "learning_rate": 5.341711380965829e-05,
266
+ "loss": 0.6743782806396484,
267
+ "step": 3300
268
+ },
269
+ {
270
+ "epoch": 0.4801412180052957,
271
+ "grad_norm": 1.2769715785980225,
272
+ "learning_rate": 5.2005083309799496e-05,
273
+ "loss": 0.673993911743164,
274
+ "step": 3400
275
+ },
276
+ {
277
+ "epoch": 0.4942630185348632,
278
+ "grad_norm": 4.500333309173584,
279
+ "learning_rate": 5.05930528099407e-05,
280
+ "loss": 0.6348028564453125,
281
+ "step": 3500
282
+ },
283
+ {
284
+ "epoch": 0.5083848190644307,
285
+ "grad_norm": 1.812221646308899,
286
+ "learning_rate": 4.91810223100819e-05,
287
+ "loss": 0.5954225158691406,
288
+ "step": 3600
289
+ },
290
+ {
291
+ "epoch": 0.5225066195939982,
292
+ "grad_norm": 0.6688806414604187,
293
+ "learning_rate": 4.77689918102231e-05,
294
+ "loss": 0.5159746551513672,
295
+ "step": 3700
296
+ },
297
+ {
298
+ "epoch": 0.5366284201235657,
299
+ "grad_norm": 6.0633955001831055,
300
+ "learning_rate": 4.635696131036431e-05,
301
+ "loss": 0.5547808074951172,
302
+ "step": 3800
303
+ },
304
+ {
305
+ "epoch": 0.5507502206531333,
306
+ "grad_norm": 2.232146739959717,
307
+ "learning_rate": 4.494493081050551e-05,
308
+ "loss": 0.6260259246826172,
309
+ "step": 3900
310
+ },
311
+ {
312
+ "epoch": 0.5648720211827007,
313
+ "grad_norm": 2.8028974533081055,
314
+ "learning_rate": 4.3532900310646716e-05,
315
+ "loss": 0.5969748687744141,
316
+ "step": 4000
317
+ },
318
+ {
319
+ "epoch": 0.5789938217122683,
320
+ "grad_norm": 2.3292088508605957,
321
+ "learning_rate": 4.212086981078791e-05,
322
+ "loss": 0.591428108215332,
323
+ "step": 4100
324
+ },
325
+ {
326
+ "epoch": 0.5931156222418358,
327
+ "grad_norm": 1.5047627687454224,
328
+ "learning_rate": 4.070883931092912e-05,
329
+ "loss": 0.643886947631836,
330
+ "step": 4200
331
+ },
332
+ {
333
+ "epoch": 0.5931156222418358,
334
+ "eval_accuracy": 0.9696997825279796,
335
+ "eval_f1": 0.9548881928083998,
336
+ "eval_loss": 0.10884281992912292,
337
+ "eval_matthews_correlation": 0.9598345847922993,
338
+ "eval_precision": 0.9471929173612073,
339
+ "eval_recall": 0.9633364273308933,
340
+ "eval_runtime": 305.3891,
341
+ "eval_samples_per_second": 246.937,
342
+ "eval_steps_per_second": 30.869,
343
+ "step": 4200
344
+ },
345
+ {
346
+ "epoch": 0.6072374227714034,
347
+ "grad_norm": 2.4066836833953857,
348
+ "learning_rate": 3.929680881107032e-05,
349
+ "loss": 0.6496241760253906,
350
+ "step": 4300
351
+ },
352
+ {
353
+ "epoch": 0.6213592233009708,
354
+ "grad_norm": 3.095889091491699,
355
+ "learning_rate": 3.788477831121152e-05,
356
+ "loss": 0.5657179641723633,
357
+ "step": 4400
358
+ },
359
+ {
360
+ "epoch": 0.6354810238305384,
361
+ "grad_norm": 3.7599406242370605,
362
+ "learning_rate": 3.6472747811352724e-05,
363
+ "loss": 0.6106902694702149,
364
+ "step": 4500
365
+ },
366
+ {
367
+ "epoch": 0.649602824360106,
368
+ "grad_norm": 2.1620442867279053,
369
+ "learning_rate": 3.506071731149393e-05,
370
+ "loss": 0.5624249267578125,
371
+ "step": 4600
372
+ },
373
+ {
374
+ "epoch": 0.6637246248896734,
375
+ "grad_norm": 2.772578716278076,
376
+ "learning_rate": 3.364868681163513e-05,
377
+ "loss": 0.5569720840454102,
378
+ "step": 4700
379
+ },
380
+ {
381
+ "epoch": 0.677846425419241,
382
+ "grad_norm": 4.468015193939209,
383
+ "learning_rate": 3.223665631177634e-05,
384
+ "loss": 0.5418438339233398,
385
+ "step": 4800
386
+ },
387
+ {
388
+ "epoch": 0.6919682259488085,
389
+ "grad_norm": 4.624788761138916,
390
+ "learning_rate": 3.082462581191754e-05,
391
+ "loss": 0.5999539184570313,
392
+ "step": 4900
393
+ },
394
+ {
395
+ "epoch": 0.706090026478376,
396
+ "grad_norm": 6.000360488891602,
397
+ "learning_rate": 2.9412595312058745e-05,
398
+ "loss": 0.5288655090332032,
399
+ "step": 5000
400
+ },
401
+ {
402
+ "epoch": 0.7202118270079435,
403
+ "grad_norm": 3.9073150157928467,
404
+ "learning_rate": 2.8000564812199946e-05,
405
+ "loss": 0.6136045455932617,
406
+ "step": 5100
407
+ },
408
+ {
409
+ "epoch": 0.7343336275375111,
410
+ "grad_norm": 7.168360233306885,
411
+ "learning_rate": 2.6588534312341147e-05,
412
+ "loss": 0.44071128845214846,
413
+ "step": 5200
414
+ },
415
+ {
416
+ "epoch": 0.7484554280670785,
417
+ "grad_norm": 6.492976188659668,
418
+ "learning_rate": 2.5176503812482348e-05,
419
+ "loss": 0.5461198425292969,
420
+ "step": 5300
421
+ },
422
+ {
423
+ "epoch": 0.7625772285966461,
424
+ "grad_norm": 2.2384190559387207,
425
+ "learning_rate": 2.3764473312623552e-05,
426
+ "loss": 0.5522915649414063,
427
+ "step": 5400
428
+ },
429
+ {
430
+ "epoch": 0.7766990291262136,
431
+ "grad_norm": 0.10446355491876602,
432
+ "learning_rate": 2.2352442812764757e-05,
433
+ "loss": 0.5230790328979492,
434
+ "step": 5500
435
+ },
436
+ {
437
+ "epoch": 0.7908208296557812,
438
+ "grad_norm": 1.8123565912246704,
439
+ "learning_rate": 2.094041231290596e-05,
440
+ "loss": 0.5251728820800782,
441
+ "step": 5600
442
+ },
443
+ {
444
+ "epoch": 0.7908208296557812,
445
+ "eval_accuracy": 0.973545324351562,
446
+ "eval_f1": 0.9593753511152517,
447
+ "eval_loss": 0.09802598506212234,
448
+ "eval_matthews_correlation": 0.9648894189840553,
449
+ "eval_precision": 0.9522183984246139,
450
+ "eval_recall": 0.9674262657206655,
451
+ "eval_runtime": 302.8972,
452
+ "eval_samples_per_second": 248.969,
453
+ "eval_steps_per_second": 31.123,
454
+ "step": 5600
455
+ },
456
+ {
457
+ "epoch": 0.8049426301853486,
458
+ "grad_norm": 2.708657741546631,
459
+ "learning_rate": 1.9528381813047165e-05,
460
+ "loss": 0.6050262451171875,
461
+ "step": 5700
462
+ },
463
+ {
464
+ "epoch": 0.8190644307149162,
465
+ "grad_norm": 2.155137062072754,
466
+ "learning_rate": 1.8116351313188366e-05,
467
+ "loss": 0.5030016326904296,
468
+ "step": 5800
469
+ },
470
+ {
471
+ "epoch": 0.8331862312444837,
472
+ "grad_norm": 4.321381568908691,
473
+ "learning_rate": 1.6704320813329567e-05,
474
+ "loss": 0.5906719207763672,
475
+ "step": 5900
476
+ },
477
+ {
478
+ "epoch": 0.8473080317740512,
479
+ "grad_norm": 0.12370330095291138,
480
+ "learning_rate": 1.529229031347077e-05,
481
+ "loss": 0.5666491317749024,
482
+ "step": 6000
483
+ },
484
+ {
485
+ "epoch": 0.8614298323036187,
486
+ "grad_norm": 1.295456886291504,
487
+ "learning_rate": 1.3880259813611976e-05,
488
+ "loss": 0.5372732925415039,
489
+ "step": 6100
490
+ },
491
+ {
492
+ "epoch": 0.8755516328331863,
493
+ "grad_norm": 6.4877028465271,
494
+ "learning_rate": 1.2468229313753179e-05,
495
+ "loss": 0.5397153091430664,
496
+ "step": 6200
497
+ },
498
+ {
499
+ "epoch": 0.8896734333627537,
500
+ "grad_norm": 1.712461233139038,
501
+ "learning_rate": 1.1056198813894381e-05,
502
+ "loss": 0.4744549179077148,
503
+ "step": 6300
504
+ },
505
+ {
506
+ "epoch": 0.9037952338923213,
507
+ "grad_norm": 7.258785247802734,
508
+ "learning_rate": 9.644168314035584e-06,
509
+ "loss": 0.5170178985595704,
510
+ "step": 6400
511
+ },
512
+ {
513
+ "epoch": 0.9179170344218888,
514
+ "grad_norm": 3.859020233154297,
515
+ "learning_rate": 8.232137814176786e-06,
516
+ "loss": 0.5684902954101563,
517
+ "step": 6500
518
+ },
519
+ {
520
+ "epoch": 0.9320388349514563,
521
+ "grad_norm": 4.142998695373535,
522
+ "learning_rate": 6.82010731431799e-06,
523
+ "loss": 0.43791793823242187,
524
+ "step": 6600
525
+ },
526
+ {
527
+ "epoch": 0.9461606354810238,
528
+ "grad_norm": 0.17755526304244995,
529
+ "learning_rate": 5.4080768144591926e-06,
530
+ "loss": 0.4931388473510742,
531
+ "step": 6700
532
+ },
533
+ {
534
+ "epoch": 0.9602824360105914,
535
+ "grad_norm": 5.609339714050293,
536
+ "learning_rate": 3.996046314600395e-06,
537
+ "loss": 0.5045675277709961,
538
+ "step": 6800
539
+ },
540
+ {
541
+ "epoch": 0.9744042365401588,
542
+ "grad_norm": 4.085425853729248,
543
+ "learning_rate": 2.5840158147415987e-06,
544
+ "loss": 0.4947611618041992,
545
+ "step": 6900
546
+ },
547
+ {
548
+ "epoch": 0.9885260370697264,
549
+ "grad_norm": 1.071936845779419,
550
+ "learning_rate": 1.1719853148828015e-06,
551
+ "loss": 0.5044781875610351,
552
+ "step": 7000
553
+ },
554
+ {
555
+ "epoch": 0.9885260370697264,
556
+ "eval_accuracy": 0.9750304991248078,
557
+ "eval_f1": 0.9613335088674932,
558
+ "eval_loss": 0.09293721616268158,
559
+ "eval_matthews_correlation": 0.9668512518488812,
560
+ "eval_precision": 0.9548993836479353,
561
+ "eval_recall": 0.9683487035732542,
562
+ "eval_runtime": 303.2272,
563
+ "eval_samples_per_second": 248.698,
564
+ "eval_steps_per_second": 31.089,
565
+ "step": 7000
566
+ }
567
+ ],
568
+ "logging_steps": 100,
569
+ "max_steps": 7082,
570
+ "num_input_tokens_seen": 0,
571
+ "num_train_epochs": 1,
572
+ "save_steps": 1400,
573
+ "stateful_callbacks": {
574
+ "TrainerControl": {
575
+ "args": {
576
+ "should_epoch_stop": false,
577
+ "should_evaluate": false,
578
+ "should_log": false,
579
+ "should_save": true,
580
+ "should_training_stop": false
581
+ },
582
+ "attributes": {}
583
+ }
584
+ },
585
+ "total_flos": 2591404483145472.0,
586
+ "train_batch_size": 8,
587
+ "trial_name": null,
588
+ "trial_params": null
589
+ }
archive/checkpoint-7000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbc43a6df41aa56c9f391a65b3d477accf7214857e0284685112be68e451f09d
3
+ size 5265
archive/checkpoint-7082/README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: roberta-base
3
+ library_name: peft
4
+ tags:
5
+ - base_model:adapter:roberta-base
6
+ - lora
7
+ - transformers
8
+ ---
9
+
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+
24
+ - **Developed by:** [More Information Needed]
25
+ - **Funded by [optional]:** [More Information Needed]
26
+ - **Shared by [optional]:** [More Information Needed]
27
+ - **Model type:** [More Information Needed]
28
+ - **Language(s) (NLP):** [More Information Needed]
29
+ - **License:** [More Information Needed]
30
+ - **Finetuned from model [optional]:** [More Information Needed]
31
+
32
+ ### Model Sources [optional]
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+
36
+ - **Repository:** [More Information Needed]
37
+ - **Paper [optional]:** [More Information Needed]
38
+ - **Demo [optional]:** [More Information Needed]
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ ### Direct Use
45
+
46
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
+
48
+ [More Information Needed]
49
+
50
+ ### Downstream Use [optional]
51
+
52
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
+
54
+ [More Information Needed]
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ [More Information Needed]
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ [More Information Needed]
67
+
68
+ ### Recommendations
69
+
70
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
71
+
72
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
73
+
74
+ ## How to Get Started with the Model
75
+
76
+ Use the code below to get started with the model.
77
+
78
+ [More Information Needed]
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
+
86
+ [More Information Needed]
87
+
88
+ ### Training Procedure
89
+
90
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
91
+
92
+ #### Preprocessing [optional]
93
+
94
+ [More Information Needed]
95
+
96
+
97
+ #### Training Hyperparameters
98
+
99
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
+
101
+ #### Speeds, Sizes, Times [optional]
102
+
103
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
+
105
+ [More Information Needed]
106
+
107
+ ## Evaluation
108
+
109
+ <!-- This section describes the evaluation protocols and provides the results. -->
110
+
111
+ ### Testing Data, Factors & Metrics
112
+
113
+ #### Testing Data
114
+
115
+ <!-- This should link to a Dataset Card if possible. -->
116
+
117
+ [More Information Needed]
118
+
119
+ #### Factors
120
+
121
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
122
+
123
+ [More Information Needed]
124
+
125
+ #### Metrics
126
+
127
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
128
+
129
+ [More Information Needed]
130
+
131
+ ### Results
132
+
133
+ [More Information Needed]
134
+
135
+ #### Summary
136
+
137
+
138
+
139
+ ## Model Examination [optional]
140
+
141
+ <!-- Relevant interpretability work for the model goes here -->
142
+
143
+ [More Information Needed]
144
+
145
+ ## Environmental Impact
146
+
147
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
148
+
149
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
150
+
151
+ - **Hardware Type:** [More Information Needed]
152
+ - **Hours used:** [More Information Needed]
153
+ - **Cloud Provider:** [More Information Needed]
154
+ - **Compute Region:** [More Information Needed]
155
+ - **Carbon Emitted:** [More Information Needed]
156
+
157
+ ## Technical Specifications [optional]
158
+
159
+ ### Model Architecture and Objective
160
+
161
+ [More Information Needed]
162
+
163
+ ### Compute Infrastructure
164
+
165
+ [More Information Needed]
166
+
167
+ #### Hardware
168
+
169
+ [More Information Needed]
170
+
171
+ #### Software
172
+
173
+ [More Information Needed]
174
+
175
+ ## Citation [optional]
176
+
177
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
178
+
179
+ **BibTeX:**
180
+
181
+ [More Information Needed]
182
+
183
+ **APA:**
184
+
185
+ [More Information Needed]
186
+
187
+ ## Glossary [optional]
188
+
189
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
190
+
191
+ [More Information Needed]
192
+
193
+ ## More Information [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Authors [optional]
198
+
199
+ [More Information Needed]
200
+
201
+ ## Model Card Contact
202
+
203
+ [More Information Needed]
204
+ ### Framework versions
205
+
206
+ - PEFT 0.18.1
archive/checkpoint-7082/adapter_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "roberta-base",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "classifier",
26
+ "score"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 8,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "query",
36
+ "value"
37
+ ],
38
+ "target_parameters": null,
39
+ "task_type": "SEQ_CLS",
40
+ "trainable_token_indices": null,
41
+ "use_dora": false,
42
+ "use_qalora": false,
43
+ "use_rslora": false
44
+ }
archive/checkpoint-7082/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b86b71cce1a1c8ff2d9251e7644135ba54cafacad02f83e46a7192bcf4f9f73
3
+ size 3567808
archive/checkpoint-7082/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b38f90334a23d702b8cc9f3d10400a4d606cce1199172fe460cb03b21e2e5bc
3
+ size 7166091
archive/checkpoint-7082/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3d86d2a26f9c9912eea34431cbb4d043a971061964c315c7e65f652255462bb
3
+ size 14645
archive/checkpoint-7082/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7774bb17b93256e9fc86699e2094280a0f02faa2a40885314d24e0ab48334f8
3
+ size 1465
archive/checkpoint-7082/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
archive/checkpoint-7082/tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<s>",
5
+ "cls_token": "<s>",
6
+ "eos_token": "</s>",
7
+ "errors": "replace",
8
+ "is_local": false,
9
+ "mask_token": "<mask>",
10
+ "model_max_length": 512,
11
+ "pad_token": "<pad>",
12
+ "sep_token": "</s>",
13
+ "tokenizer_class": "RobertaTokenizer",
14
+ "trim_offsets": true,
15
+ "unk_token": "<unk>"
16
+ }
archive/checkpoint-7082/trainer_state.json ADDED
@@ -0,0 +1,589 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 7000,
3
+ "best_metric": 0.9613335088674932,
4
+ "best_model_checkpoint": "results/weights/q8intlora/peft-roberta-base/checkpoint-7000",
5
+ "epoch": 1.0,
6
+ "eval_steps": 1400,
7
+ "global_step": 7082,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.01412180052956752,
14
+ "grad_norm": 3.725158929824829,
15
+ "learning_rate": 9.86020898051398e-05,
16
+ "loss": 7.050463256835937,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.02824360105913504,
21
+ "grad_norm": 4.400176048278809,
22
+ "learning_rate": 9.7190059305281e-05,
23
+ "loss": 4.082845153808594,
24
+ "step": 200
25
+ },
26
+ {
27
+ "epoch": 0.04236540158870256,
28
+ "grad_norm": 3.011744499206543,
29
+ "learning_rate": 9.57780288054222e-05,
30
+ "loss": 2.8129367065429687,
31
+ "step": 300
32
+ },
33
+ {
34
+ "epoch": 0.05648720211827008,
35
+ "grad_norm": 3.1704041957855225,
36
+ "learning_rate": 9.436599830556341e-05,
37
+ "loss": 2.3832768249511718,
38
+ "step": 400
39
+ },
40
+ {
41
+ "epoch": 0.0706090026478376,
42
+ "grad_norm": 4.340809345245361,
43
+ "learning_rate": 9.295396780570461e-05,
44
+ "loss": 2.0935064697265626,
45
+ "step": 500
46
+ },
47
+ {
48
+ "epoch": 0.08473080317740513,
49
+ "grad_norm": 2.461683750152588,
50
+ "learning_rate": 9.154193730584581e-05,
51
+ "loss": 1.8044235229492187,
52
+ "step": 600
53
+ },
54
+ {
55
+ "epoch": 0.09885260370697264,
56
+ "grad_norm": 4.021312713623047,
57
+ "learning_rate": 9.012990680598702e-05,
58
+ "loss": 1.7957286071777343,
59
+ "step": 700
60
+ },
61
+ {
62
+ "epoch": 0.11297440423654016,
63
+ "grad_norm": 3.464273452758789,
64
+ "learning_rate": 8.871787630612822e-05,
65
+ "loss": 1.6542176818847656,
66
+ "step": 800
67
+ },
68
+ {
69
+ "epoch": 0.12709620476610767,
70
+ "grad_norm": 3.0971059799194336,
71
+ "learning_rate": 8.730584580626942e-05,
72
+ "loss": 1.501783905029297,
73
+ "step": 900
74
+ },
75
+ {
76
+ "epoch": 0.1412180052956752,
77
+ "grad_norm": 4.295173168182373,
78
+ "learning_rate": 8.589381530641061e-05,
79
+ "loss": 1.3875682067871093,
80
+ "step": 1000
81
+ },
82
+ {
83
+ "epoch": 0.1553398058252427,
84
+ "grad_norm": 2.0098490715026855,
85
+ "learning_rate": 8.448178480655182e-05,
86
+ "loss": 1.4273907470703124,
87
+ "step": 1100
88
+ },
89
+ {
90
+ "epoch": 0.16946160635481025,
91
+ "grad_norm": 3.7033345699310303,
92
+ "learning_rate": 8.306975430669302e-05,
93
+ "loss": 1.2835659790039062,
94
+ "step": 1200
95
+ },
96
+ {
97
+ "epoch": 0.18358340688437777,
98
+ "grad_norm": 3.607093095779419,
99
+ "learning_rate": 8.165772380683423e-05,
100
+ "loss": 1.172675552368164,
101
+ "step": 1300
102
+ },
103
+ {
104
+ "epoch": 0.1977052074139453,
105
+ "grad_norm": 4.655322551727295,
106
+ "learning_rate": 8.024569330697543e-05,
107
+ "loss": 1.1659706878662108,
108
+ "step": 1400
109
+ },
110
+ {
111
+ "epoch": 0.1977052074139453,
112
+ "eval_accuracy": 0.929467458759879,
113
+ "eval_f1": 0.9067117960974359,
114
+ "eval_loss": 0.21753081679344177,
115
+ "eval_matthews_correlation": 0.907413923387255,
116
+ "eval_precision": 0.8868148616081014,
117
+ "eval_recall": 0.9307027856599929,
118
+ "eval_runtime": 302.3947,
119
+ "eval_samples_per_second": 249.383,
120
+ "eval_steps_per_second": 31.174,
121
+ "step": 1400
122
+ },
123
+ {
124
+ "epoch": 0.2118270079435128,
125
+ "grad_norm": 2.124321460723877,
126
+ "learning_rate": 7.883366280711664e-05,
127
+ "loss": 1.2192025756835938,
128
+ "step": 1500
129
+ },
130
+ {
131
+ "epoch": 0.22594880847308033,
132
+ "grad_norm": 2.0321645736694336,
133
+ "learning_rate": 7.742163230725784e-05,
134
+ "loss": 1.1063846588134765,
135
+ "step": 1600
136
+ },
137
+ {
138
+ "epoch": 0.24007060900264784,
139
+ "grad_norm": 7.473122596740723,
140
+ "learning_rate": 7.600960180739905e-05,
141
+ "loss": 1.1167493438720704,
142
+ "step": 1700
143
+ },
144
+ {
145
+ "epoch": 0.25419240953221534,
146
+ "grad_norm": 1.6879299879074097,
147
+ "learning_rate": 7.459757130754025e-05,
148
+ "loss": 1.0563026428222657,
149
+ "step": 1800
150
+ },
151
+ {
152
+ "epoch": 0.26831421006178285,
153
+ "grad_norm": 5.7779130935668945,
154
+ "learning_rate": 7.318554080768146e-05,
155
+ "loss": 0.9488992309570312,
156
+ "step": 1900
157
+ },
158
+ {
159
+ "epoch": 0.2824360105913504,
160
+ "grad_norm": 1.244494080543518,
161
+ "learning_rate": 7.177351030782266e-05,
162
+ "loss": 0.8847799682617188,
163
+ "step": 2000
164
+ },
165
+ {
166
+ "epoch": 0.2965578111209179,
167
+ "grad_norm": 4.008191108703613,
168
+ "learning_rate": 7.036147980796385e-05,
169
+ "loss": 0.9060035705566406,
170
+ "step": 2100
171
+ },
172
+ {
173
+ "epoch": 0.3106796116504854,
174
+ "grad_norm": 2.906226396560669,
175
+ "learning_rate": 6.894944930810506e-05,
176
+ "loss": 0.7914878082275391,
177
+ "step": 2200
178
+ },
179
+ {
180
+ "epoch": 0.324801412180053,
181
+ "grad_norm": 1.6399933099746704,
182
+ "learning_rate": 6.753741880824626e-05,
183
+ "loss": 0.7373094177246093,
184
+ "step": 2300
185
+ },
186
+ {
187
+ "epoch": 0.3389232127096205,
188
+ "grad_norm": 3.8636691570281982,
189
+ "learning_rate": 6.612538830838746e-05,
190
+ "loss": 0.7353231048583985,
191
+ "step": 2400
192
+ },
193
+ {
194
+ "epoch": 0.353045013239188,
195
+ "grad_norm": 4.8219194412231445,
196
+ "learning_rate": 6.471335780852866e-05,
197
+ "loss": 0.8310698699951172,
198
+ "step": 2500
199
+ },
200
+ {
201
+ "epoch": 0.36716681376875554,
202
+ "grad_norm": 1.61204993724823,
203
+ "learning_rate": 6.330132730866987e-05,
204
+ "loss": 0.7148534393310547,
205
+ "step": 2600
206
+ },
207
+ {
208
+ "epoch": 0.38128861429832306,
209
+ "grad_norm": 4.425326824188232,
210
+ "learning_rate": 6.188929680881107e-05,
211
+ "loss": 0.8226445770263672,
212
+ "step": 2700
213
+ },
214
+ {
215
+ "epoch": 0.3954104148278906,
216
+ "grad_norm": 3.553007125854492,
217
+ "learning_rate": 6.047726630895228e-05,
218
+ "loss": 0.6587068939208984,
219
+ "step": 2800
220
+ },
221
+ {
222
+ "epoch": 0.3954104148278906,
223
+ "eval_accuracy": 0.9599002811223678,
224
+ "eval_f1": 0.9412536039168936,
225
+ "eval_loss": 0.14896713197231293,
226
+ "eval_matthews_correlation": 0.9470253620243468,
227
+ "eval_precision": 0.9284857025679184,
228
+ "eval_recall": 0.9570486713200069,
229
+ "eval_runtime": 299.6915,
230
+ "eval_samples_per_second": 251.632,
231
+ "eval_steps_per_second": 31.456,
232
+ "step": 2800
233
+ },
234
+ {
235
+ "epoch": 0.4095322153574581,
236
+ "grad_norm": 1.7863709926605225,
237
+ "learning_rate": 5.9065235809093475e-05,
238
+ "loss": 0.7582288360595704,
239
+ "step": 2900
240
+ },
241
+ {
242
+ "epoch": 0.4236540158870256,
243
+ "grad_norm": 2.5715460777282715,
244
+ "learning_rate": 5.765320530923468e-05,
245
+ "loss": 0.7553373718261719,
246
+ "step": 3000
247
+ },
248
+ {
249
+ "epoch": 0.43777581641659313,
250
+ "grad_norm": 9.767241477966309,
251
+ "learning_rate": 5.6241174809375883e-05,
252
+ "loss": 0.653603515625,
253
+ "step": 3100
254
+ },
255
+ {
256
+ "epoch": 0.45189761694616065,
257
+ "grad_norm": 3.407860279083252,
258
+ "learning_rate": 5.482914430951709e-05,
259
+ "loss": 0.6524411010742187,
260
+ "step": 3200
261
+ },
262
+ {
263
+ "epoch": 0.46601941747572817,
264
+ "grad_norm": 2.109328031539917,
265
+ "learning_rate": 5.341711380965829e-05,
266
+ "loss": 0.6743782806396484,
267
+ "step": 3300
268
+ },
269
+ {
270
+ "epoch": 0.4801412180052957,
271
+ "grad_norm": 1.2769715785980225,
272
+ "learning_rate": 5.2005083309799496e-05,
273
+ "loss": 0.673993911743164,
274
+ "step": 3400
275
+ },
276
+ {
277
+ "epoch": 0.4942630185348632,
278
+ "grad_norm": 4.500333309173584,
279
+ "learning_rate": 5.05930528099407e-05,
280
+ "loss": 0.6348028564453125,
281
+ "step": 3500
282
+ },
283
+ {
284
+ "epoch": 0.5083848190644307,
285
+ "grad_norm": 1.812221646308899,
286
+ "learning_rate": 4.91810223100819e-05,
287
+ "loss": 0.5954225158691406,
288
+ "step": 3600
289
+ },
290
+ {
291
+ "epoch": 0.5225066195939982,
292
+ "grad_norm": 0.6688806414604187,
293
+ "learning_rate": 4.77689918102231e-05,
294
+ "loss": 0.5159746551513672,
295
+ "step": 3700
296
+ },
297
+ {
298
+ "epoch": 0.5366284201235657,
299
+ "grad_norm": 6.0633955001831055,
300
+ "learning_rate": 4.635696131036431e-05,
301
+ "loss": 0.5547808074951172,
302
+ "step": 3800
303
+ },
304
+ {
305
+ "epoch": 0.5507502206531333,
306
+ "grad_norm": 2.232146739959717,
307
+ "learning_rate": 4.494493081050551e-05,
308
+ "loss": 0.6260259246826172,
309
+ "step": 3900
310
+ },
311
+ {
312
+ "epoch": 0.5648720211827007,
313
+ "grad_norm": 2.8028974533081055,
314
+ "learning_rate": 4.3532900310646716e-05,
315
+ "loss": 0.5969748687744141,
316
+ "step": 4000
317
+ },
318
+ {
319
+ "epoch": 0.5789938217122683,
320
+ "grad_norm": 2.3292088508605957,
321
+ "learning_rate": 4.212086981078791e-05,
322
+ "loss": 0.591428108215332,
323
+ "step": 4100
324
+ },
325
+ {
326
+ "epoch": 0.5931156222418358,
327
+ "grad_norm": 1.5047627687454224,
328
+ "learning_rate": 4.070883931092912e-05,
329
+ "loss": 0.643886947631836,
330
+ "step": 4200
331
+ },
332
+ {
333
+ "epoch": 0.5931156222418358,
334
+ "eval_accuracy": 0.9696997825279796,
335
+ "eval_f1": 0.9548881928083998,
336
+ "eval_loss": 0.10884281992912292,
337
+ "eval_matthews_correlation": 0.9598345847922993,
338
+ "eval_precision": 0.9471929173612073,
339
+ "eval_recall": 0.9633364273308933,
340
+ "eval_runtime": 305.3891,
341
+ "eval_samples_per_second": 246.937,
342
+ "eval_steps_per_second": 30.869,
343
+ "step": 4200
344
+ },
345
+ {
346
+ "epoch": 0.6072374227714034,
347
+ "grad_norm": 2.4066836833953857,
348
+ "learning_rate": 3.929680881107032e-05,
349
+ "loss": 0.6496241760253906,
350
+ "step": 4300
351
+ },
352
+ {
353
+ "epoch": 0.6213592233009708,
354
+ "grad_norm": 3.095889091491699,
355
+ "learning_rate": 3.788477831121152e-05,
356
+ "loss": 0.5657179641723633,
357
+ "step": 4400
358
+ },
359
+ {
360
+ "epoch": 0.6354810238305384,
361
+ "grad_norm": 3.7599406242370605,
362
+ "learning_rate": 3.6472747811352724e-05,
363
+ "loss": 0.6106902694702149,
364
+ "step": 4500
365
+ },
366
+ {
367
+ "epoch": 0.649602824360106,
368
+ "grad_norm": 2.1620442867279053,
369
+ "learning_rate": 3.506071731149393e-05,
370
+ "loss": 0.5624249267578125,
371
+ "step": 4600
372
+ },
373
+ {
374
+ "epoch": 0.6637246248896734,
375
+ "grad_norm": 2.772578716278076,
376
+ "learning_rate": 3.364868681163513e-05,
377
+ "loss": 0.5569720840454102,
378
+ "step": 4700
379
+ },
380
+ {
381
+ "epoch": 0.677846425419241,
382
+ "grad_norm": 4.468015193939209,
383
+ "learning_rate": 3.223665631177634e-05,
384
+ "loss": 0.5418438339233398,
385
+ "step": 4800
386
+ },
387
+ {
388
+ "epoch": 0.6919682259488085,
389
+ "grad_norm": 4.624788761138916,
390
+ "learning_rate": 3.082462581191754e-05,
391
+ "loss": 0.5999539184570313,
392
+ "step": 4900
393
+ },
394
+ {
395
+ "epoch": 0.706090026478376,
396
+ "grad_norm": 6.000360488891602,
397
+ "learning_rate": 2.9412595312058745e-05,
398
+ "loss": 0.5288655090332032,
399
+ "step": 5000
400
+ },
401
+ {
402
+ "epoch": 0.7202118270079435,
403
+ "grad_norm": 3.9073150157928467,
404
+ "learning_rate": 2.8000564812199946e-05,
405
+ "loss": 0.6136045455932617,
406
+ "step": 5100
407
+ },
408
+ {
409
+ "epoch": 0.7343336275375111,
410
+ "grad_norm": 7.168360233306885,
411
+ "learning_rate": 2.6588534312341147e-05,
412
+ "loss": 0.44071128845214846,
413
+ "step": 5200
414
+ },
415
+ {
416
+ "epoch": 0.7484554280670785,
417
+ "grad_norm": 6.492976188659668,
418
+ "learning_rate": 2.5176503812482348e-05,
419
+ "loss": 0.5461198425292969,
420
+ "step": 5300
421
+ },
422
+ {
423
+ "epoch": 0.7625772285966461,
424
+ "grad_norm": 2.2384190559387207,
425
+ "learning_rate": 2.3764473312623552e-05,
426
+ "loss": 0.5522915649414063,
427
+ "step": 5400
428
+ },
429
+ {
430
+ "epoch": 0.7766990291262136,
431
+ "grad_norm": 0.10446355491876602,
432
+ "learning_rate": 2.2352442812764757e-05,
433
+ "loss": 0.5230790328979492,
434
+ "step": 5500
435
+ },
436
+ {
437
+ "epoch": 0.7908208296557812,
438
+ "grad_norm": 1.8123565912246704,
439
+ "learning_rate": 2.094041231290596e-05,
440
+ "loss": 0.5251728820800782,
441
+ "step": 5600
442
+ },
443
+ {
444
+ "epoch": 0.7908208296557812,
445
+ "eval_accuracy": 0.973545324351562,
446
+ "eval_f1": 0.9593753511152517,
447
+ "eval_loss": 0.09802598506212234,
448
+ "eval_matthews_correlation": 0.9648894189840553,
449
+ "eval_precision": 0.9522183984246139,
450
+ "eval_recall": 0.9674262657206655,
451
+ "eval_runtime": 302.8972,
452
+ "eval_samples_per_second": 248.969,
453
+ "eval_steps_per_second": 31.123,
454
+ "step": 5600
455
+ },
456
+ {
457
+ "epoch": 0.8049426301853486,
458
+ "grad_norm": 2.708657741546631,
459
+ "learning_rate": 1.9528381813047165e-05,
460
+ "loss": 0.6050262451171875,
461
+ "step": 5700
462
+ },
463
+ {
464
+ "epoch": 0.8190644307149162,
465
+ "grad_norm": 2.155137062072754,
466
+ "learning_rate": 1.8116351313188366e-05,
467
+ "loss": 0.5030016326904296,
468
+ "step": 5800
469
+ },
470
+ {
471
+ "epoch": 0.8331862312444837,
472
+ "grad_norm": 4.321381568908691,
473
+ "learning_rate": 1.6704320813329567e-05,
474
+ "loss": 0.5906719207763672,
475
+ "step": 5900
476
+ },
477
+ {
478
+ "epoch": 0.8473080317740512,
479
+ "grad_norm": 0.12370330095291138,
480
+ "learning_rate": 1.529229031347077e-05,
481
+ "loss": 0.5666491317749024,
482
+ "step": 6000
483
+ },
484
+ {
485
+ "epoch": 0.8614298323036187,
486
+ "grad_norm": 1.295456886291504,
487
+ "learning_rate": 1.3880259813611976e-05,
488
+ "loss": 0.5372732925415039,
489
+ "step": 6100
490
+ },
491
+ {
492
+ "epoch": 0.8755516328331863,
493
+ "grad_norm": 6.4877028465271,
494
+ "learning_rate": 1.2468229313753179e-05,
495
+ "loss": 0.5397153091430664,
496
+ "step": 6200
497
+ },
498
+ {
499
+ "epoch": 0.8896734333627537,
500
+ "grad_norm": 1.712461233139038,
501
+ "learning_rate": 1.1056198813894381e-05,
502
+ "loss": 0.4744549179077148,
503
+ "step": 6300
504
+ },
505
+ {
506
+ "epoch": 0.9037952338923213,
507
+ "grad_norm": 7.258785247802734,
508
+ "learning_rate": 9.644168314035584e-06,
509
+ "loss": 0.5170178985595704,
510
+ "step": 6400
511
+ },
512
+ {
513
+ "epoch": 0.9179170344218888,
514
+ "grad_norm": 3.859020233154297,
515
+ "learning_rate": 8.232137814176786e-06,
516
+ "loss": 0.5684902954101563,
517
+ "step": 6500
518
+ },
519
+ {
520
+ "epoch": 0.9320388349514563,
521
+ "grad_norm": 4.142998695373535,
522
+ "learning_rate": 6.82010731431799e-06,
523
+ "loss": 0.43791793823242187,
524
+ "step": 6600
525
+ },
526
+ {
527
+ "epoch": 0.9461606354810238,
528
+ "grad_norm": 0.17755526304244995,
529
+ "learning_rate": 5.4080768144591926e-06,
530
+ "loss": 0.4931388473510742,
531
+ "step": 6700
532
+ },
533
+ {
534
+ "epoch": 0.9602824360105914,
535
+ "grad_norm": 5.609339714050293,
536
+ "learning_rate": 3.996046314600395e-06,
537
+ "loss": 0.5045675277709961,
538
+ "step": 6800
539
+ },
540
+ {
541
+ "epoch": 0.9744042365401588,
542
+ "grad_norm": 4.085425853729248,
543
+ "learning_rate": 2.5840158147415987e-06,
544
+ "loss": 0.4947611618041992,
545
+ "step": 6900
546
+ },
547
+ {
548
+ "epoch": 0.9885260370697264,
549
+ "grad_norm": 1.071936845779419,
550
+ "learning_rate": 1.1719853148828015e-06,
551
+ "loss": 0.5044781875610351,
552
+ "step": 7000
553
+ },
554
+ {
555
+ "epoch": 0.9885260370697264,
556
+ "eval_accuracy": 0.9750304991248078,
557
+ "eval_f1": 0.9613335088674932,
558
+ "eval_loss": 0.09293721616268158,
559
+ "eval_matthews_correlation": 0.9668512518488812,
560
+ "eval_precision": 0.9548993836479353,
561
+ "eval_recall": 0.9683487035732542,
562
+ "eval_runtime": 303.2272,
563
+ "eval_samples_per_second": 248.698,
564
+ "eval_steps_per_second": 31.089,
565
+ "step": 7000
566
+ }
567
+ ],
568
+ "logging_steps": 100,
569
+ "max_steps": 7082,
570
+ "num_input_tokens_seen": 0,
571
+ "num_train_epochs": 1,
572
+ "save_steps": 1400,
573
+ "stateful_callbacks": {
574
+ "TrainerControl": {
575
+ "args": {
576
+ "should_epoch_stop": false,
577
+ "should_evaluate": false,
578
+ "should_log": false,
579
+ "should_save": true,
580
+ "should_training_stop": true
581
+ },
582
+ "attributes": {}
583
+ }
584
+ },
585
+ "total_flos": 2621886414450048.0,
586
+ "train_batch_size": 8,
587
+ "trial_name": null,
588
+ "trial_params": null
589
+ }
archive/checkpoint-7082/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbc43a6df41aa56c9f391a65b3d477accf7214857e0284685112be68e451f09d
3
+ size 5265
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b49ab111d77a36d20c49082ed0bf29a2e4185f8210b8934c20a6b275b50d4f55
3
  size 5265
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08747cf29321824e71aa993ca57b70a3e903a69c34a3e7cefaf3115f8b5d97db
3
  size 5265