pszemraj commited on
Commit
ec40ae9
·
verified ·
0 Parent(s):

Super-squash branch 'main' using huggingface_hub

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: roberta-base
4
+ tags:
5
+ - genre
6
+ - books
7
+ - multi-label
8
+ - dataset tools
9
+ metrics:
10
+ - f1
11
+ widget:
12
+ - text: >-
13
+ Meet Gertrude, a penguin detective who can't stand the cold. When a shrimp
14
+ cocktail goes missing from the Iceberg Lounge, it's up to her to solve the
15
+ mystery, wearing her collection of custom-made tropical turtlenecks.
16
+ example_title: Tropical Turtlenecks
17
+ - text: >-
18
+ Professor Wobblebottom, a notorious forgetful scientist, invents a time
19
+ machine but forgets how to use it. Now he is randomly popping into
20
+ significant historical events, ruining everything. The future of the past
21
+ is in the balance.
22
+ example_title: When I Forgot The Time
23
+ - text: >-
24
+ In a world where hugs are currency and your social credit score is
25
+ determined by your knack for dad jokes, John, a man who is allergic to
26
+ laughter, has to navigate his way without becoming broke—or
27
+ broken-hearted.
28
+ example_title: Laugh Now, Pay Later
29
+ - text: >-
30
+ Emily, a vegan vampire, is faced with an ethical dilemma when she falls
31
+ head over heels for a human butcher named Bob. Will she bite the forbidden
32
+ fruit or stick to her plant-based blood substitutes?
33
+ example_title: Love at First Bite... Or Not
34
+ - text: >-
35
+ Steve, a sentient self-driving car, wants to be a Broadway star. His dream
36
+ seems unreachable until he meets Sally, a GPS system with the voice of an
37
+ angel and ambitions of her own.
38
+ example_title: Broadway or Bust
39
+ - text: >-
40
+ Dr. Fredrick Tensor, a socially awkward computer scientist, is on a quest
41
+ to perfect AI companionship. However, his models keep outputting
42
+ cringe-worthy, melodramatic waifus that scare away even the most die-hard
43
+ fans of AI romance. Frustrated and lonely, Fredrick must debug his love
44
+ life and algorithms before it's too late.
45
+ example_title: Love.exe Has Stopped Working
46
+ language:
47
+ - en
48
+ pipeline_tag: text-classification
49
+ ---
50
+
51
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
52
+ should probably proofread and complete it, then remove this comment. -->
53
+
54
+ # BEE-spoke-data/roberta-base-description2genre
55
+
56
+ This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on an unknown dataset.
57
+ It achieves the following results on the evaluation set:
58
+ - Loss: 0.2130
59
+ - F1: 0.6717
60
+
61
+ ## Model description
62
+
63
+ This classifies one or more **genre** labels in a **multi-label** setting for a given book **description**.
64
+
65
+ ## Training procedure
66
+
67
+ ### Training hyperparameters
68
+
69
+ The following hyperparameters were used during training:
70
+ - learning_rate: 4e-05
71
+ - train_batch_size: 64
72
+ - eval_batch_size: 64
73
+ - seed: 42
74
+ - gradient_accumulation_steps: 2
75
+ - total_train_batch_size: 128
76
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-10
77
+ - lr_scheduler_type: linear
78
+ - lr_scheduler_warmup_ratio: 0.04
79
+ - num_epochs: 6.0
80
+
81
+ ### Training results
82
+
83
+ | Training Loss | Epoch | Step | Validation Loss | F1 |
84
+ |:-------------:|:-----:|:----:|:---------------:|:------:|
85
+ | 0.3118 | 1.0 | 62 | 0.2885 | 0.3362 |
86
+ | 0.2676 | 2.0 | 124 | 0.2511 | 0.4882 |
87
+ | 0.2325 | 3.0 | 186 | 0.2272 | 0.6093 |
88
+ | 0.2127 | 4.0 | 248 | 0.2181 | 0.6591 |
89
+ | 0.1978 | 5.0 | 310 | 0.2140 | 0.6686 |
90
+ | 0.1817 | 6.0 | 372 | 0.2130 | 0.6717 |
91
+
92
+
93
+ ### Framework versions
94
+
95
+ - Transformers 4.33.3
96
+ - Pytorch 2.2.0.dev20231001+cu121
97
+ - Datasets 2.14.5
98
+ - Tokenizers 0.13.3
all_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "eval_f1": 0.6717231571462432,
4
+ "eval_loss": 0.21304987370967865,
5
+ "eval_runtime": 3.3638,
6
+ "eval_samples": 989,
7
+ "eval_samples_per_second": 294.016,
8
+ "eval_steps_per_second": 4.757,
9
+ "train_loss": 0.2572957956662742,
10
+ "train_runtime": 515.2049,
11
+ "train_samples": 7914,
12
+ "train_samples_per_second": 92.165,
13
+ "train_steps_per_second": 0.722
14
+ }
config.json ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "roberta-base",
3
+ "architectures": [
4
+ "RobertaForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "finetuning_task": "text-classification",
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "id2label": {
15
+ "0": "History & Politics",
16
+ "1": "Health & Medicine",
17
+ "2": "Mystery & Thriller",
18
+ "3": "Arts & Design",
19
+ "4": "Self-Help & Wellness",
20
+ "5": "Sports & Recreation",
21
+ "6": "Non-Fiction",
22
+ "7": "Science Fiction & Fantasy",
23
+ "8": "Countries & Geography",
24
+ "9": "Other",
25
+ "10": "Nature & Environment",
26
+ "11": "Business & Finance",
27
+ "12": "Romance",
28
+ "13": "Philosophy & Religion",
29
+ "14": "Literature & Fiction",
30
+ "15": "Science & Technology",
31
+ "16": "Children & Young Adult",
32
+ "17": "Food & Cooking"
33
+ },
34
+ "initializer_range": 0.02,
35
+ "intermediate_size": 3072,
36
+ "label2id": {
37
+ "Arts & Design": 3,
38
+ "Business & Finance": 11,
39
+ "Children & Young Adult": 16,
40
+ "Countries & Geography": 8,
41
+ "Food & Cooking": 17,
42
+ "Health & Medicine": 1,
43
+ "History & Politics": 0,
44
+ "Literature & Fiction": 14,
45
+ "Mystery & Thriller": 2,
46
+ "Nature & Environment": 10,
47
+ "Non-Fiction": 6,
48
+ "Other": 9,
49
+ "Philosophy & Religion": 13,
50
+ "Romance": 12,
51
+ "Science & Technology": 15,
52
+ "Science Fiction & Fantasy": 7,
53
+ "Self-Help & Wellness": 4,
54
+ "Sports & Recreation": 5
55
+ },
56
+ "layer_norm_eps": 1e-05,
57
+ "max_position_embeddings": 514,
58
+ "model_type": "roberta",
59
+ "num_attention_heads": 12,
60
+ "num_hidden_layers": 12,
61
+ "pad_token_id": 1,
62
+ "position_embedding_type": "absolute",
63
+ "problem_type": "multi_label_classification",
64
+ "torch_dtype": "float32",
65
+ "transformers_version": "4.33.3",
66
+ "type_vocab_size": 1,
67
+ "use_cache": true,
68
+ "vocab_size": 50265
69
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "eval_f1": 0.6717231571462432,
4
+ "eval_loss": 0.21304987370967865,
5
+ "eval_runtime": 3.3638,
6
+ "eval_samples": 989,
7
+ "eval_samples_per_second": 294.016,
8
+ "eval_steps_per_second": 4.757
9
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4c520c7abd073c82df56cea22a01dc256096be731a5bda3998f9f2a214a9049
3
+ size 498662040
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<s>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "cls_token": "<s>",
6
+ "eos_token": "</s>",
7
+ "errors": "replace",
8
+ "mask_token": "<mask>",
9
+ "model_max_length": 512,
10
+ "pad_token": "<pad>",
11
+ "sep_token": "</s>",
12
+ "tokenizer_class": "RobertaTokenizer",
13
+ "trim_offsets": true,
14
+ "unk_token": "<unk>"
15
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "train_loss": 0.2572957956662742,
4
+ "train_runtime": 515.2049,
5
+ "train_samples": 7914,
6
+ "train_samples_per_second": 92.165,
7
+ "train_steps_per_second": 0.722
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,304 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 6.0,
5
+ "eval_steps": 500,
6
+ "global_step": 372,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.16,
13
+ "learning_rate": 2.6666666666666667e-05,
14
+ "loss": 0.6871,
15
+ "step": 10
16
+ },
17
+ {
18
+ "epoch": 0.32,
19
+ "learning_rate": 3.943977591036415e-05,
20
+ "loss": 0.506,
21
+ "step": 20
22
+ },
23
+ {
24
+ "epoch": 0.48,
25
+ "learning_rate": 3.8319327731092444e-05,
26
+ "loss": 0.386,
27
+ "step": 30
28
+ },
29
+ {
30
+ "epoch": 0.65,
31
+ "learning_rate": 3.719887955182073e-05,
32
+ "loss": 0.3467,
33
+ "step": 40
34
+ },
35
+ {
36
+ "epoch": 0.81,
37
+ "learning_rate": 3.6078431372549025e-05,
38
+ "loss": 0.3235,
39
+ "step": 50
40
+ },
41
+ {
42
+ "epoch": 0.97,
43
+ "learning_rate": 3.495798319327731e-05,
44
+ "loss": 0.3118,
45
+ "step": 60
46
+ },
47
+ {
48
+ "epoch": 1.0,
49
+ "eval_f1": 0.336166194523135,
50
+ "eval_loss": 0.2885044813156128,
51
+ "eval_runtime": 3.3771,
52
+ "eval_samples_per_second": 292.857,
53
+ "eval_steps_per_second": 4.738,
54
+ "step": 62
55
+ },
56
+ {
57
+ "epoch": 1.13,
58
+ "learning_rate": 3.383753501400561e-05,
59
+ "loss": 0.3046,
60
+ "step": 70
61
+ },
62
+ {
63
+ "epoch": 1.29,
64
+ "learning_rate": 3.2717086834733894e-05,
65
+ "loss": 0.2906,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 1.45,
70
+ "learning_rate": 3.159663865546219e-05,
71
+ "loss": 0.2896,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 1.61,
76
+ "learning_rate": 3.047619047619048e-05,
77
+ "loss": 0.2826,
78
+ "step": 100
79
+ },
80
+ {
81
+ "epoch": 1.77,
82
+ "learning_rate": 2.935574229691877e-05,
83
+ "loss": 0.2749,
84
+ "step": 110
85
+ },
86
+ {
87
+ "epoch": 1.94,
88
+ "learning_rate": 2.8235294117647063e-05,
89
+ "loss": 0.2676,
90
+ "step": 120
91
+ },
92
+ {
93
+ "epoch": 2.0,
94
+ "eval_f1": 0.4882154882154882,
95
+ "eval_loss": 0.25112977623939514,
96
+ "eval_runtime": 3.3888,
97
+ "eval_samples_per_second": 291.842,
98
+ "eval_steps_per_second": 4.721,
99
+ "step": 124
100
+ },
101
+ {
102
+ "epoch": 2.1,
103
+ "learning_rate": 2.7114845938375354e-05,
104
+ "loss": 0.2534,
105
+ "step": 130
106
+ },
107
+ {
108
+ "epoch": 2.26,
109
+ "learning_rate": 2.5994397759103644e-05,
110
+ "loss": 0.2491,
111
+ "step": 140
112
+ },
113
+ {
114
+ "epoch": 2.42,
115
+ "learning_rate": 2.4873949579831935e-05,
116
+ "loss": 0.2452,
117
+ "step": 150
118
+ },
119
+ {
120
+ "epoch": 2.58,
121
+ "learning_rate": 2.3753501400560226e-05,
122
+ "loss": 0.2373,
123
+ "step": 160
124
+ },
125
+ {
126
+ "epoch": 2.74,
127
+ "learning_rate": 2.2633053221288516e-05,
128
+ "loss": 0.2355,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 2.9,
133
+ "learning_rate": 2.1512605042016807e-05,
134
+ "loss": 0.2325,
135
+ "step": 180
136
+ },
137
+ {
138
+ "epoch": 3.0,
139
+ "eval_f1": 0.6093467596178673,
140
+ "eval_loss": 0.22724518179893494,
141
+ "eval_runtime": 3.3851,
142
+ "eval_samples_per_second": 292.16,
143
+ "eval_steps_per_second": 4.727,
144
+ "step": 186
145
+ },
146
+ {
147
+ "epoch": 3.06,
148
+ "learning_rate": 2.0392156862745097e-05,
149
+ "loss": 0.231,
150
+ "step": 190
151
+ },
152
+ {
153
+ "epoch": 3.23,
154
+ "learning_rate": 1.927170868347339e-05,
155
+ "loss": 0.2195,
156
+ "step": 200
157
+ },
158
+ {
159
+ "epoch": 3.39,
160
+ "learning_rate": 1.8151260504201682e-05,
161
+ "loss": 0.2156,
162
+ "step": 210
163
+ },
164
+ {
165
+ "epoch": 3.55,
166
+ "learning_rate": 1.7030812324929973e-05,
167
+ "loss": 0.2184,
168
+ "step": 220
169
+ },
170
+ {
171
+ "epoch": 3.71,
172
+ "learning_rate": 1.5910364145658263e-05,
173
+ "loss": 0.2109,
174
+ "step": 230
175
+ },
176
+ {
177
+ "epoch": 3.87,
178
+ "learning_rate": 1.4789915966386557e-05,
179
+ "loss": 0.2127,
180
+ "step": 240
181
+ },
182
+ {
183
+ "epoch": 4.0,
184
+ "eval_f1": 0.6591346153846154,
185
+ "eval_loss": 0.21806256473064423,
186
+ "eval_runtime": 3.3784,
187
+ "eval_samples_per_second": 292.738,
188
+ "eval_steps_per_second": 4.736,
189
+ "step": 248
190
+ },
191
+ {
192
+ "epoch": 4.03,
193
+ "learning_rate": 1.3669467787114848e-05,
194
+ "loss": 0.2067,
195
+ "step": 250
196
+ },
197
+ {
198
+ "epoch": 4.19,
199
+ "learning_rate": 1.2549019607843138e-05,
200
+ "loss": 0.1984,
201
+ "step": 260
202
+ },
203
+ {
204
+ "epoch": 4.35,
205
+ "learning_rate": 1.1428571428571429e-05,
206
+ "loss": 0.1957,
207
+ "step": 270
208
+ },
209
+ {
210
+ "epoch": 4.52,
211
+ "learning_rate": 1.030812324929972e-05,
212
+ "loss": 0.1967,
213
+ "step": 280
214
+ },
215
+ {
216
+ "epoch": 4.68,
217
+ "learning_rate": 9.187675070028012e-06,
218
+ "loss": 0.1975,
219
+ "step": 290
220
+ },
221
+ {
222
+ "epoch": 4.84,
223
+ "learning_rate": 8.067226890756303e-06,
224
+ "loss": 0.194,
225
+ "step": 300
226
+ },
227
+ {
228
+ "epoch": 5.0,
229
+ "learning_rate": 6.946778711484594e-06,
230
+ "loss": 0.1978,
231
+ "step": 310
232
+ },
233
+ {
234
+ "epoch": 5.0,
235
+ "eval_f1": 0.6685687113647171,
236
+ "eval_loss": 0.2140355408191681,
237
+ "eval_runtime": 3.3785,
238
+ "eval_samples_per_second": 292.736,
239
+ "eval_steps_per_second": 4.736,
240
+ "step": 310
241
+ },
242
+ {
243
+ "epoch": 5.16,
244
+ "learning_rate": 5.826330532212886e-06,
245
+ "loss": 0.1877,
246
+ "step": 320
247
+ },
248
+ {
249
+ "epoch": 5.32,
250
+ "learning_rate": 4.705882352941177e-06,
251
+ "loss": 0.1877,
252
+ "step": 330
253
+ },
254
+ {
255
+ "epoch": 5.48,
256
+ "learning_rate": 3.585434173669468e-06,
257
+ "loss": 0.1803,
258
+ "step": 340
259
+ },
260
+ {
261
+ "epoch": 5.65,
262
+ "learning_rate": 2.4649859943977594e-06,
263
+ "loss": 0.1874,
264
+ "step": 350
265
+ },
266
+ {
267
+ "epoch": 5.81,
268
+ "learning_rate": 1.3445378151260504e-06,
269
+ "loss": 0.1911,
270
+ "step": 360
271
+ },
272
+ {
273
+ "epoch": 5.97,
274
+ "learning_rate": 2.2408963585434175e-07,
275
+ "loss": 0.1817,
276
+ "step": 370
277
+ },
278
+ {
279
+ "epoch": 6.0,
280
+ "eval_f1": 0.6717231571462432,
281
+ "eval_loss": 0.21304987370967865,
282
+ "eval_runtime": 3.3805,
283
+ "eval_samples_per_second": 292.556,
284
+ "eval_steps_per_second": 4.733,
285
+ "step": 372
286
+ },
287
+ {
288
+ "epoch": 6.0,
289
+ "step": 372,
290
+ "total_flos": 1.2495360147628032e+16,
291
+ "train_loss": 0.2572957956662742,
292
+ "train_runtime": 515.2049,
293
+ "train_samples_per_second": 92.165,
294
+ "train_steps_per_second": 0.722
295
+ }
296
+ ],
297
+ "logging_steps": 10,
298
+ "max_steps": 372,
299
+ "num_train_epochs": 6,
300
+ "save_steps": 500,
301
+ "total_flos": 1.2495360147628032e+16,
302
+ "trial_name": null,
303
+ "trial_params": null
304
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffcdde489b8cb6656f86cad05fc0b45b1a8c7061c8dc94ef0ae77c80b51974bd
3
+ size 4600
vocab.json ADDED
The diff for this file is too large to render. See raw diff