feliponi commited on
Commit
404ce1d
·
verified ·
1 Parent(s): 1be9356

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ language: en
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: token-classification
5
+ tags:
6
+ - ner
7
+ - token-classification
8
+ - skills
9
+ - experience
10
+ - roberta
11
+ - hirly
12
+ ---
13
+
14
+ # Entity Extraction NER Model for CVs and JDs (Skills & Experience)
15
+
16
+ This is a `roberta-base` model fine-tuned for **Named Entity Recognition (NER)** on Human Resources documents, specifically Résumés (CVs) and Job Descriptions (JDs).
17
+
18
+ The model was trained on a private dataset of approximately **20,000 examples** generated using a **Weak Labeling** strategy. Its primary goal is to extract skills and quantifiable years of experience from free-form text.
19
+
20
+ ## Recognized Entities
21
+
22
+ The model is trained to extract two main entity types (5 BIO labels):
23
+
24
+ * **`SKILL`**: Technical skills, software, tools, or soft skills.
25
+ * *Examples: "Python", "machine learning", "React", "AWS", "leadership"*
26
+ * **`EXPERIENCE_DURATION`**: Text spans that describe a duration of time.
27
+ * *Examples: "5+ years", "6 months", "3-5 anos", "two years of experience"*
28
+
29
+ ## How to Use (Python)
30
+
31
+ You can use this model directly with the `token-classification` (or `ner`) pipeline from the `transformers` library.
32
+
33
+ ```python
34
+ from transformers import pipeline
35
+
36
+ # Load the model from the Hub
37
+ # (Replace with your actual model ID, e.g., "your-username/hirly-ner-multi")
38
+ model_id = "your-username/hirly-ner-multi"
39
+
40
+ # Initialize the pipeline
41
+ # aggregation_strategy="simple" groups B- and I- tags (e.g., B-SKILL, I-SKILL -> SKILL)
42
+ extractor = pipeline(
43
+ "ner",
44
+ model=model_id,
45
+ aggregation_strategy="simple"
46
+ )
47
+
48
+ # Example text
49
+ text = "Data Scientist with 5+ years of experience in Python and machine learning. Also 6 months in Java."
50
+
51
+ # Get entities
52
+ entities = extractor(text)
53
+
54
+ # Filter for high confidence
55
+ min_confidence = 0.7
56
+ confident_entities = [e for e in entities if e['score'] >= min_confidence]
57
+
58
+ # Print the results
59
+ for entity in confident_entities:
60
+ print(f"[{entity['entity_group']}] {entity['word']} (Confidence: {entity['score']:.2f})")
61
+ ````
62
+
63
+ **Expected Output:**
64
+
65
+ ````
66
+ [EXPERIENCE_DURATION] 5+ years (Confidence: 1.00)
67
+ [SKILL] Python (Confidence: 0.99)
68
+ [SKILL] machine learning (Confidence: 1.00)
69
+ [EXPERIENCE_DURATION] 6 months (Confidence: 1.00)
70
+ [SKILL] Java (Confidence: 0.99)
71
+ ````
72
+
73
+ ## Training, Performance, and Limitations
74
+
75
+ This model's performance is a direct result of its training data and weak labeling methodology.
76
+
77
+ ### Performance
78
+
79
+ The model was validated on a test set of \~2,000 examples, achieving the following F1-scores:
80
+
81
+ | Entity | F1-Score |
82
+ | :--- | :--- |
83
+ | **`EXPERIENCE_DURATION`** | **99.9%** |
84
+ | **`SKILL`** | **97.6%** |
85
+ | **Overall** | **98.8%** |
86
+
87
+ ### Training Methodology
88
+
89
+ 1. **`EXPERIENCE_DURATION` (High Quality):** This entity was labeled using a robust set of regular expressions designed to find time patterns (e.g., "5+ years", "six months"). Its near-perfect F1 score reflects this.
90
+
91
+ 2. **`SKILL` (High Recall, Lower Precision):** This entity was labeled by performing *exact matching* against a large, proprietary vocabulary of \~8,700 terms.
92
+
93
+ ### Limitations (Important)
94
+
95
+ * **Vocabulary Dependency:** The model is excellent at finding the **8,700 skills** it was trained on. It will *not* reliably find new skills or tools that were absent from the training vocabulary. It functions more as a "high-speed vocabulary extractor" than a "skill concept detector."
96
+ * **False Positives:** Because the source vocabulary contained generic words, the model learned to tag them as `SKILL` with high confidence. **Users of this model should filter the output** to remove known false positives.
97
+ * *Examples of common false positives: "communication", "leadership", "teamwork", "project", "skills"*.
98
+ * **Noise:** The model may occasionally output low-confidence punctuation or noise (e.g., `.` with a 0.33 score, as seen in the sample output). It is highly recommended to **filter results by a confidence score (e.g., `score > 0.7`)** for clean outputs.
99
+
100
+ <!-- end list -->
checkpoint-3996/config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaForTokenClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "dtype": "float32",
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "id2label": {
14
+ "0": "O",
15
+ "1": "B-EXPERIENCE_DURATION",
16
+ "2": "I-EXPERIENCE_DURATION",
17
+ "3": "B-SKILL",
18
+ "4": "I-SKILL"
19
+ },
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 3072,
22
+ "label2id": {
23
+ "B-EXPERIENCE_DURATION": 1,
24
+ "B-SKILL": 3,
25
+ "I-EXPERIENCE_DURATION": 2,
26
+ "I-SKILL": 4,
27
+ "O": 0
28
+ },
29
+ "layer_norm_eps": 1e-05,
30
+ "max_position_embeddings": 514,
31
+ "model_type": "roberta",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "pad_token_id": 1,
35
+ "position_embedding_type": "absolute",
36
+ "transformers_version": "4.57.1",
37
+ "type_vocab_size": 1,
38
+ "use_cache": true,
39
+ "vocab_size": 50265
40
+ }
checkpoint-3996/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3996/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7b0f2b765322719f4b9a0c274f8b1019bafa0ce6a2f2d23995f7c7da266c68c
3
+ size 496259468
checkpoint-3996/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8e7cbd32dff232761ad83d38d449e99c19924f5d2e9223d7f4051646aed2eed
3
+ size 992640715
checkpoint-3996/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fbbfb6023b74c9ffcc1f860d845c2e67a4d5a9c9c1609370397d7438d4fc5f9
3
+ size 14645
checkpoint-3996/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8932b239b5f37887457924d15c5df53b8a8cdce4e363554380fa933b8e08408a
3
+ size 1465
checkpoint-3996/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
checkpoint-3996/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-3996/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "mask_token": "<mask>",
52
+ "model_max_length": 512,
53
+ "pad_token": "<pad>",
54
+ "sep_token": "</s>",
55
+ "tokenizer_class": "RobertaTokenizer",
56
+ "trim_offsets": true,
57
+ "unk_token": "<unk>"
58
+ }
checkpoint-3996/trainer_state.json ADDED
@@ -0,0 +1,621 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 3996,
3
+ "best_metric": 0.9872586428584664,
4
+ "best_model_checkpoint": "models/hirly_ner_multi\\checkpoint-3996",
5
+ "epoch": 2.0,
6
+ "eval_steps": 500,
7
+ "global_step": 3996,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.025025025025025027,
14
+ "grad_norm": 2.1700778007507324,
15
+ "learning_rate": 1.9836503169836507e-05,
16
+ "loss": 0.5362,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.05005005005005005,
21
+ "grad_norm": 1.6467891931533813,
22
+ "learning_rate": 1.966966966966967e-05,
23
+ "loss": 0.3009,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.07507507507507508,
28
+ "grad_norm": 1.7352838516235352,
29
+ "learning_rate": 1.9502836169502837e-05,
30
+ "loss": 0.2276,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.1001001001001001,
35
+ "grad_norm": 2.1085104942321777,
36
+ "learning_rate": 1.9336002669336004e-05,
37
+ "loss": 0.1991,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.12512512512512514,
42
+ "grad_norm": 1.6021695137023926,
43
+ "learning_rate": 1.916916916916917e-05,
44
+ "loss": 0.1728,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 0.15015015015015015,
49
+ "grad_norm": 2.7993948459625244,
50
+ "learning_rate": 1.9002335669002338e-05,
51
+ "loss": 0.1496,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 0.17517517517517517,
56
+ "grad_norm": 1.3490108251571655,
57
+ "learning_rate": 1.8835502168835505e-05,
58
+ "loss": 0.1359,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 0.2002002002002002,
63
+ "grad_norm": 1.5092374086380005,
64
+ "learning_rate": 1.866866866866867e-05,
65
+ "loss": 0.1269,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 0.22522522522522523,
70
+ "grad_norm": 1.8146309852600098,
71
+ "learning_rate": 1.8501835168501835e-05,
72
+ "loss": 0.1136,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 0.2502502502502503,
77
+ "grad_norm": 2.7670438289642334,
78
+ "learning_rate": 1.8335001668335005e-05,
79
+ "loss": 0.1068,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 0.2752752752752753,
84
+ "grad_norm": 1.6677714586257935,
85
+ "learning_rate": 1.816816816816817e-05,
86
+ "loss": 0.1058,
87
+ "step": 550
88
+ },
89
+ {
90
+ "epoch": 0.3003003003003003,
91
+ "grad_norm": 1.3509212732315063,
92
+ "learning_rate": 1.8001334668001336e-05,
93
+ "loss": 0.0962,
94
+ "step": 600
95
+ },
96
+ {
97
+ "epoch": 0.3253253253253253,
98
+ "grad_norm": 1.6370694637298584,
99
+ "learning_rate": 1.7834501167834503e-05,
100
+ "loss": 0.0953,
101
+ "step": 650
102
+ },
103
+ {
104
+ "epoch": 0.35035035035035034,
105
+ "grad_norm": 1.0707803964614868,
106
+ "learning_rate": 1.766766766766767e-05,
107
+ "loss": 0.0867,
108
+ "step": 700
109
+ },
110
+ {
111
+ "epoch": 0.37537537537537535,
112
+ "grad_norm": 1.6105570793151855,
113
+ "learning_rate": 1.7500834167500836e-05,
114
+ "loss": 0.0861,
115
+ "step": 750
116
+ },
117
+ {
118
+ "epoch": 0.4004004004004004,
119
+ "grad_norm": 2.311500310897827,
120
+ "learning_rate": 1.7334000667334e-05,
121
+ "loss": 0.0768,
122
+ "step": 800
123
+ },
124
+ {
125
+ "epoch": 0.42542542542542544,
126
+ "grad_norm": 1.4976152181625366,
127
+ "learning_rate": 1.716716716716717e-05,
128
+ "loss": 0.0729,
129
+ "step": 850
130
+ },
131
+ {
132
+ "epoch": 0.45045045045045046,
133
+ "grad_norm": 1.4623268842697144,
134
+ "learning_rate": 1.7000333667000334e-05,
135
+ "loss": 0.0724,
136
+ "step": 900
137
+ },
138
+ {
139
+ "epoch": 0.4754754754754755,
140
+ "grad_norm": 1.320494294166565,
141
+ "learning_rate": 1.68335001668335e-05,
142
+ "loss": 0.071,
143
+ "step": 950
144
+ },
145
+ {
146
+ "epoch": 0.5005005005005005,
147
+ "grad_norm": 1.2657185792922974,
148
+ "learning_rate": 1.6666666666666667e-05,
149
+ "loss": 0.0744,
150
+ "step": 1000
151
+ },
152
+ {
153
+ "epoch": 0.5255255255255256,
154
+ "grad_norm": 1.65080988407135,
155
+ "learning_rate": 1.6499833166499834e-05,
156
+ "loss": 0.0683,
157
+ "step": 1050
158
+ },
159
+ {
160
+ "epoch": 0.5505505505505506,
161
+ "grad_norm": 1.3511857986450195,
162
+ "learning_rate": 1.6332999666333e-05,
163
+ "loss": 0.0589,
164
+ "step": 1100
165
+ },
166
+ {
167
+ "epoch": 0.5755755755755756,
168
+ "grad_norm": 1.4793460369110107,
169
+ "learning_rate": 1.6166166166166168e-05,
170
+ "loss": 0.0621,
171
+ "step": 1150
172
+ },
173
+ {
174
+ "epoch": 0.6006006006006006,
175
+ "grad_norm": 1.1826051473617554,
176
+ "learning_rate": 1.5999332665999335e-05,
177
+ "loss": 0.0618,
178
+ "step": 1200
179
+ },
180
+ {
181
+ "epoch": 0.6256256256256256,
182
+ "grad_norm": 2.016144037246704,
183
+ "learning_rate": 1.58324991658325e-05,
184
+ "loss": 0.0609,
185
+ "step": 1250
186
+ },
187
+ {
188
+ "epoch": 0.6506506506506506,
189
+ "grad_norm": 1.3155397176742554,
190
+ "learning_rate": 1.566566566566567e-05,
191
+ "loss": 0.0562,
192
+ "step": 1300
193
+ },
194
+ {
195
+ "epoch": 0.6756756756756757,
196
+ "grad_norm": 1.2205989360809326,
197
+ "learning_rate": 1.5498832165498832e-05,
198
+ "loss": 0.0576,
199
+ "step": 1350
200
+ },
201
+ {
202
+ "epoch": 0.7007007007007007,
203
+ "grad_norm": 1.0111949443817139,
204
+ "learning_rate": 1.5331998665332e-05,
205
+ "loss": 0.0511,
206
+ "step": 1400
207
+ },
208
+ {
209
+ "epoch": 0.7257257257257257,
210
+ "grad_norm": 2.1711137294769287,
211
+ "learning_rate": 1.5165165165165166e-05,
212
+ "loss": 0.0519,
213
+ "step": 1450
214
+ },
215
+ {
216
+ "epoch": 0.7507507507507507,
217
+ "grad_norm": 1.7720271348953247,
218
+ "learning_rate": 1.4998331664998333e-05,
219
+ "loss": 0.0555,
220
+ "step": 1500
221
+ },
222
+ {
223
+ "epoch": 0.7757757757757757,
224
+ "grad_norm": 1.2108412981033325,
225
+ "learning_rate": 1.48314981648315e-05,
226
+ "loss": 0.0497,
227
+ "step": 1550
228
+ },
229
+ {
230
+ "epoch": 0.8008008008008008,
231
+ "grad_norm": 1.0561896562576294,
232
+ "learning_rate": 1.4664664664664665e-05,
233
+ "loss": 0.051,
234
+ "step": 1600
235
+ },
236
+ {
237
+ "epoch": 0.8258258258258259,
238
+ "grad_norm": 1.4415959119796753,
239
+ "learning_rate": 1.4497831164497834e-05,
240
+ "loss": 0.0557,
241
+ "step": 1650
242
+ },
243
+ {
244
+ "epoch": 0.8508508508508509,
245
+ "grad_norm": 1.4428836107254028,
246
+ "learning_rate": 1.4330997664330999e-05,
247
+ "loss": 0.0514,
248
+ "step": 1700
249
+ },
250
+ {
251
+ "epoch": 0.8758758758758759,
252
+ "grad_norm": 1.2458566427230835,
253
+ "learning_rate": 1.4164164164164164e-05,
254
+ "loss": 0.0503,
255
+ "step": 1750
256
+ },
257
+ {
258
+ "epoch": 0.9009009009009009,
259
+ "grad_norm": 1.9587106704711914,
260
+ "learning_rate": 1.3997330663997333e-05,
261
+ "loss": 0.0431,
262
+ "step": 1800
263
+ },
264
+ {
265
+ "epoch": 0.9259259259259259,
266
+ "grad_norm": 1.3225224018096924,
267
+ "learning_rate": 1.3830497163830498e-05,
268
+ "loss": 0.0469,
269
+ "step": 1850
270
+ },
271
+ {
272
+ "epoch": 0.950950950950951,
273
+ "grad_norm": 1.0749801397323608,
274
+ "learning_rate": 1.3663663663663665e-05,
275
+ "loss": 0.0437,
276
+ "step": 1900
277
+ },
278
+ {
279
+ "epoch": 0.975975975975976,
280
+ "grad_norm": 1.1957429647445679,
281
+ "learning_rate": 1.349683016349683e-05,
282
+ "loss": 0.0377,
283
+ "step": 1950
284
+ },
285
+ {
286
+ "epoch": 1.0,
287
+ "eval_EXPERIENCE_DURATION_f1": 0.9994408649556124,
288
+ "eval_EXPERIENCE_DURATION_precision": 0.9988826083516309,
289
+ "eval_EXPERIENCE_DURATION_recall": 1.0,
290
+ "eval_SKILL_f1": 0.9567528637214718,
291
+ "eval_SKILL_precision": 0.9501047362511714,
292
+ "eval_SKILL_recall": 0.9635754868567566,
293
+ "eval_f1": 0.9780968643385421,
294
+ "eval_loss": 0.036164652556180954,
295
+ "eval_precision": 0.9744936723014012,
296
+ "eval_recall": 0.9817877434283783,
297
+ "eval_runtime": 34.7404,
298
+ "eval_samples_per_second": 57.512,
299
+ "eval_steps_per_second": 7.196,
300
+ "step": 1998
301
+ },
302
+ {
303
+ "epoch": 1.001001001001001,
304
+ "grad_norm": 1.368384838104248,
305
+ "learning_rate": 1.3329996663329999e-05,
306
+ "loss": 0.0439,
307
+ "step": 2000
308
+ },
309
+ {
310
+ "epoch": 1.026026026026026,
311
+ "grad_norm": 0.9916768074035645,
312
+ "learning_rate": 1.3163163163163164e-05,
313
+ "loss": 0.0357,
314
+ "step": 2050
315
+ },
316
+ {
317
+ "epoch": 1.0510510510510511,
318
+ "grad_norm": 1.0130974054336548,
319
+ "learning_rate": 1.2996329662996329e-05,
320
+ "loss": 0.0374,
321
+ "step": 2100
322
+ },
323
+ {
324
+ "epoch": 1.0760760760760761,
325
+ "grad_norm": 0.9283238053321838,
326
+ "learning_rate": 1.2829496162829498e-05,
327
+ "loss": 0.0309,
328
+ "step": 2150
329
+ },
330
+ {
331
+ "epoch": 1.1011011011011012,
332
+ "grad_norm": 1.4641971588134766,
333
+ "learning_rate": 1.2662662662662663e-05,
334
+ "loss": 0.0376,
335
+ "step": 2200
336
+ },
337
+ {
338
+ "epoch": 1.1261261261261262,
339
+ "grad_norm": 1.2201365232467651,
340
+ "learning_rate": 1.249582916249583e-05,
341
+ "loss": 0.0345,
342
+ "step": 2250
343
+ },
344
+ {
345
+ "epoch": 1.1511511511511512,
346
+ "grad_norm": 0.949704110622406,
347
+ "learning_rate": 1.2328995662328997e-05,
348
+ "loss": 0.0344,
349
+ "step": 2300
350
+ },
351
+ {
352
+ "epoch": 1.1761761761761762,
353
+ "grad_norm": 1.91875422000885,
354
+ "learning_rate": 1.2162162162162164e-05,
355
+ "loss": 0.0319,
356
+ "step": 2350
357
+ },
358
+ {
359
+ "epoch": 1.2012012012012012,
360
+ "grad_norm": 0.7262997627258301,
361
+ "learning_rate": 1.1995328661995329e-05,
362
+ "loss": 0.0343,
363
+ "step": 2400
364
+ },
365
+ {
366
+ "epoch": 1.2262262262262262,
367
+ "grad_norm": 0.9259161353111267,
368
+ "learning_rate": 1.1828495161828497e-05,
369
+ "loss": 0.0382,
370
+ "step": 2450
371
+ },
372
+ {
373
+ "epoch": 1.2512512512512513,
374
+ "grad_norm": 1.3489047288894653,
375
+ "learning_rate": 1.1661661661661663e-05,
376
+ "loss": 0.0351,
377
+ "step": 2500
378
+ },
379
+ {
380
+ "epoch": 1.2762762762762763,
381
+ "grad_norm": 1.2273073196411133,
382
+ "learning_rate": 1.149482816149483e-05,
383
+ "loss": 0.0343,
384
+ "step": 2550
385
+ },
386
+ {
387
+ "epoch": 1.3013013013013013,
388
+ "grad_norm": 0.983469545841217,
389
+ "learning_rate": 1.1327994661327995e-05,
390
+ "loss": 0.0318,
391
+ "step": 2600
392
+ },
393
+ {
394
+ "epoch": 1.3263263263263263,
395
+ "grad_norm": 1.8853638172149658,
396
+ "learning_rate": 1.1161161161161163e-05,
397
+ "loss": 0.0283,
398
+ "step": 2650
399
+ },
400
+ {
401
+ "epoch": 1.3513513513513513,
402
+ "grad_norm": 0.7570764422416687,
403
+ "learning_rate": 1.0994327660994328e-05,
404
+ "loss": 0.031,
405
+ "step": 2700
406
+ },
407
+ {
408
+ "epoch": 1.3763763763763763,
409
+ "grad_norm": 1.3618675470352173,
410
+ "learning_rate": 1.0827494160827494e-05,
411
+ "loss": 0.0334,
412
+ "step": 2750
413
+ },
414
+ {
415
+ "epoch": 1.4014014014014013,
416
+ "grad_norm": 2.3121964931488037,
417
+ "learning_rate": 1.0660660660660662e-05,
418
+ "loss": 0.0307,
419
+ "step": 2800
420
+ },
421
+ {
422
+ "epoch": 1.4264264264264264,
423
+ "grad_norm": 2.3740978240966797,
424
+ "learning_rate": 1.0493827160493827e-05,
425
+ "loss": 0.0324,
426
+ "step": 2850
427
+ },
428
+ {
429
+ "epoch": 1.4514514514514514,
430
+ "grad_norm": 1.5923206806182861,
431
+ "learning_rate": 1.0326993660326994e-05,
432
+ "loss": 0.0297,
433
+ "step": 2900
434
+ },
435
+ {
436
+ "epoch": 1.4764764764764764,
437
+ "grad_norm": 1.6631975173950195,
438
+ "learning_rate": 1.0160160160160161e-05,
439
+ "loss": 0.0295,
440
+ "step": 2950
441
+ },
442
+ {
443
+ "epoch": 1.5015015015015014,
444
+ "grad_norm": 1.2054911851882935,
445
+ "learning_rate": 9.993326659993328e-06,
446
+ "loss": 0.0308,
447
+ "step": 3000
448
+ },
449
+ {
450
+ "epoch": 1.5265265265265264,
451
+ "grad_norm": 1.1273478269577026,
452
+ "learning_rate": 9.826493159826493e-06,
453
+ "loss": 0.027,
454
+ "step": 3050
455
+ },
456
+ {
457
+ "epoch": 1.5515515515515514,
458
+ "grad_norm": 1.272194504737854,
459
+ "learning_rate": 9.65965965965966e-06,
460
+ "loss": 0.029,
461
+ "step": 3100
462
+ },
463
+ {
464
+ "epoch": 1.5765765765765765,
465
+ "grad_norm": 0.5842704176902771,
466
+ "learning_rate": 9.492826159492827e-06,
467
+ "loss": 0.0302,
468
+ "step": 3150
469
+ },
470
+ {
471
+ "epoch": 1.6016016016016015,
472
+ "grad_norm": 1.4535027742385864,
473
+ "learning_rate": 9.325992659325992e-06,
474
+ "loss": 0.028,
475
+ "step": 3200
476
+ },
477
+ {
478
+ "epoch": 1.6266266266266265,
479
+ "grad_norm": 1.3739656209945679,
480
+ "learning_rate": 9.15915915915916e-06,
481
+ "loss": 0.0277,
482
+ "step": 3250
483
+ },
484
+ {
485
+ "epoch": 1.6516516516516515,
486
+ "grad_norm": 1.2092796564102173,
487
+ "learning_rate": 8.992325658992326e-06,
488
+ "loss": 0.0261,
489
+ "step": 3300
490
+ },
491
+ {
492
+ "epoch": 1.6766766766766765,
493
+ "grad_norm": 1.0393718481063843,
494
+ "learning_rate": 8.825492158825493e-06,
495
+ "loss": 0.0292,
496
+ "step": 3350
497
+ },
498
+ {
499
+ "epoch": 1.7017017017017015,
500
+ "grad_norm": 1.693858027458191,
501
+ "learning_rate": 8.65865865865866e-06,
502
+ "loss": 0.0263,
503
+ "step": 3400
504
+ },
505
+ {
506
+ "epoch": 1.7267267267267268,
507
+ "grad_norm": 0.9787161350250244,
508
+ "learning_rate": 8.491825158491825e-06,
509
+ "loss": 0.0275,
510
+ "step": 3450
511
+ },
512
+ {
513
+ "epoch": 1.7517517517517518,
514
+ "grad_norm": 0.773116409778595,
515
+ "learning_rate": 8.324991658324992e-06,
516
+ "loss": 0.0275,
517
+ "step": 3500
518
+ },
519
+ {
520
+ "epoch": 1.7767767767767768,
521
+ "grad_norm": 1.8271141052246094,
522
+ "learning_rate": 8.158158158158159e-06,
523
+ "loss": 0.0277,
524
+ "step": 3550
525
+ },
526
+ {
527
+ "epoch": 1.8018018018018018,
528
+ "grad_norm": 1.5024611949920654,
529
+ "learning_rate": 7.991324657991326e-06,
530
+ "loss": 0.0221,
531
+ "step": 3600
532
+ },
533
+ {
534
+ "epoch": 1.8268268268268268,
535
+ "grad_norm": 0.6469800472259521,
536
+ "learning_rate": 7.824491157824493e-06,
537
+ "loss": 0.0273,
538
+ "step": 3650
539
+ },
540
+ {
541
+ "epoch": 1.8518518518518519,
542
+ "grad_norm": 0.947632372379303,
543
+ "learning_rate": 7.657657657657658e-06,
544
+ "loss": 0.0269,
545
+ "step": 3700
546
+ },
547
+ {
548
+ "epoch": 1.8768768768768769,
549
+ "grad_norm": 1.2095396518707275,
550
+ "learning_rate": 7.490824157490825e-06,
551
+ "loss": 0.0275,
552
+ "step": 3750
553
+ },
554
+ {
555
+ "epoch": 1.901901901901902,
556
+ "grad_norm": 1.2205108404159546,
557
+ "learning_rate": 7.323990657323992e-06,
558
+ "loss": 0.0259,
559
+ "step": 3800
560
+ },
561
+ {
562
+ "epoch": 1.926926926926927,
563
+ "grad_norm": 1.218360185623169,
564
+ "learning_rate": 7.157157157157158e-06,
565
+ "loss": 0.0239,
566
+ "step": 3850
567
+ },
568
+ {
569
+ "epoch": 1.951951951951952,
570
+ "grad_norm": 0.6379629969596863,
571
+ "learning_rate": 6.990323656990325e-06,
572
+ "loss": 0.0259,
573
+ "step": 3900
574
+ },
575
+ {
576
+ "epoch": 1.976976976976977,
577
+ "grad_norm": 0.8407019376754761,
578
+ "learning_rate": 6.823490156823492e-06,
579
+ "loss": 0.0226,
580
+ "step": 3950
581
+ },
582
+ {
583
+ "epoch": 2.0,
584
+ "eval_EXPERIENCE_DURATION_f1": 0.9996445707422472,
585
+ "eval_EXPERIENCE_DURATION_precision": 0.9992893990664562,
586
+ "eval_EXPERIENCE_DURATION_recall": 1.0,
587
+ "eval_SKILL_f1": 0.9748727149746856,
588
+ "eval_SKILL_precision": 0.9721327735633751,
589
+ "eval_SKILL_recall": 0.9776506023350151,
590
+ "eval_f1": 0.9872586428584664,
591
+ "eval_loss": 0.02220688760280609,
592
+ "eval_precision": 0.9857110863149156,
593
+ "eval_recall": 0.9888253011675074,
594
+ "eval_runtime": 35.0196,
595
+ "eval_samples_per_second": 57.054,
596
+ "eval_steps_per_second": 7.139,
597
+ "step": 3996
598
+ }
599
+ ],
600
+ "logging_steps": 50,
601
+ "max_steps": 5994,
602
+ "num_input_tokens_seen": 0,
603
+ "num_train_epochs": 3,
604
+ "save_steps": 500,
605
+ "stateful_callbacks": {
606
+ "TrainerControl": {
607
+ "args": {
608
+ "should_epoch_stop": false,
609
+ "should_evaluate": false,
610
+ "should_log": false,
611
+ "should_save": true,
612
+ "should_training_stop": false
613
+ },
614
+ "attributes": {}
615
+ }
616
+ },
617
+ "total_flos": 6979121494960680.0,
618
+ "train_batch_size": 8,
619
+ "trial_name": null,
620
+ "trial_params": null
621
+ }
checkpoint-3996/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49c6e54cca37c43eb26640eed75758105b2700bc6c513adcc761248adff1380f
3
+ size 5777
checkpoint-3996/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-5994/config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaForTokenClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "dtype": "float32",
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "id2label": {
14
+ "0": "O",
15
+ "1": "B-EXPERIENCE_DURATION",
16
+ "2": "I-EXPERIENCE_DURATION",
17
+ "3": "B-SKILL",
18
+ "4": "I-SKILL"
19
+ },
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 3072,
22
+ "label2id": {
23
+ "B-EXPERIENCE_DURATION": 1,
24
+ "B-SKILL": 3,
25
+ "I-EXPERIENCE_DURATION": 2,
26
+ "I-SKILL": 4,
27
+ "O": 0
28
+ },
29
+ "layer_norm_eps": 1e-05,
30
+ "max_position_embeddings": 514,
31
+ "model_type": "roberta",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "pad_token_id": 1,
35
+ "position_embedding_type": "absolute",
36
+ "transformers_version": "4.57.1",
37
+ "type_vocab_size": 1,
38
+ "use_cache": true,
39
+ "vocab_size": 50265
40
+ }
checkpoint-5994/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-5994/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:509d0ec667aaea68af890e0f03fe55b16f5a278671d5c074c130c8f9558cad02
3
+ size 496259468
checkpoint-5994/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0321ff2f8c8437d386e1ecbacb2a6b5bc4b3de22bfa437250060964d80fd95f3
3
+ size 992640715
checkpoint-5994/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7728dbbf6eb7d9ae24cb8239e28464ee986ba62b67306d6031f59bbd76f6121c
3
+ size 14645
checkpoint-5994/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b34c5a31528b1c106e9d3fd724f391c4db95a6342e562dff7569d350e6c00625
3
+ size 1465
checkpoint-5994/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
checkpoint-5994/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-5994/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "mask_token": "<mask>",
52
+ "model_max_length": 512,
53
+ "pad_token": "<pad>",
54
+ "sep_token": "</s>",
55
+ "tokenizer_class": "RobertaTokenizer",
56
+ "trim_offsets": true,
57
+ "unk_token": "<unk>"
58
+ }
checkpoint-5994/trainer_state.json ADDED
@@ -0,0 +1,918 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 5994,
3
+ "best_metric": 0.9890946878646674,
4
+ "best_model_checkpoint": "models/hirly_ner_multi\\checkpoint-5994",
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 5994,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.025025025025025027,
14
+ "grad_norm": 2.1700778007507324,
15
+ "learning_rate": 1.9836503169836507e-05,
16
+ "loss": 0.5362,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.05005005005005005,
21
+ "grad_norm": 1.6467891931533813,
22
+ "learning_rate": 1.966966966966967e-05,
23
+ "loss": 0.3009,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.07507507507507508,
28
+ "grad_norm": 1.7352838516235352,
29
+ "learning_rate": 1.9502836169502837e-05,
30
+ "loss": 0.2276,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.1001001001001001,
35
+ "grad_norm": 2.1085104942321777,
36
+ "learning_rate": 1.9336002669336004e-05,
37
+ "loss": 0.1991,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.12512512512512514,
42
+ "grad_norm": 1.6021695137023926,
43
+ "learning_rate": 1.916916916916917e-05,
44
+ "loss": 0.1728,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 0.15015015015015015,
49
+ "grad_norm": 2.7993948459625244,
50
+ "learning_rate": 1.9002335669002338e-05,
51
+ "loss": 0.1496,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 0.17517517517517517,
56
+ "grad_norm": 1.3490108251571655,
57
+ "learning_rate": 1.8835502168835505e-05,
58
+ "loss": 0.1359,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 0.2002002002002002,
63
+ "grad_norm": 1.5092374086380005,
64
+ "learning_rate": 1.866866866866867e-05,
65
+ "loss": 0.1269,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 0.22522522522522523,
70
+ "grad_norm": 1.8146309852600098,
71
+ "learning_rate": 1.8501835168501835e-05,
72
+ "loss": 0.1136,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 0.2502502502502503,
77
+ "grad_norm": 2.7670438289642334,
78
+ "learning_rate": 1.8335001668335005e-05,
79
+ "loss": 0.1068,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 0.2752752752752753,
84
+ "grad_norm": 1.6677714586257935,
85
+ "learning_rate": 1.816816816816817e-05,
86
+ "loss": 0.1058,
87
+ "step": 550
88
+ },
89
+ {
90
+ "epoch": 0.3003003003003003,
91
+ "grad_norm": 1.3509212732315063,
92
+ "learning_rate": 1.8001334668001336e-05,
93
+ "loss": 0.0962,
94
+ "step": 600
95
+ },
96
+ {
97
+ "epoch": 0.3253253253253253,
98
+ "grad_norm": 1.6370694637298584,
99
+ "learning_rate": 1.7834501167834503e-05,
100
+ "loss": 0.0953,
101
+ "step": 650
102
+ },
103
+ {
104
+ "epoch": 0.35035035035035034,
105
+ "grad_norm": 1.0707803964614868,
106
+ "learning_rate": 1.766766766766767e-05,
107
+ "loss": 0.0867,
108
+ "step": 700
109
+ },
110
+ {
111
+ "epoch": 0.37537537537537535,
112
+ "grad_norm": 1.6105570793151855,
113
+ "learning_rate": 1.7500834167500836e-05,
114
+ "loss": 0.0861,
115
+ "step": 750
116
+ },
117
+ {
118
+ "epoch": 0.4004004004004004,
119
+ "grad_norm": 2.311500310897827,
120
+ "learning_rate": 1.7334000667334e-05,
121
+ "loss": 0.0768,
122
+ "step": 800
123
+ },
124
+ {
125
+ "epoch": 0.42542542542542544,
126
+ "grad_norm": 1.4976152181625366,
127
+ "learning_rate": 1.716716716716717e-05,
128
+ "loss": 0.0729,
129
+ "step": 850
130
+ },
131
+ {
132
+ "epoch": 0.45045045045045046,
133
+ "grad_norm": 1.4623268842697144,
134
+ "learning_rate": 1.7000333667000334e-05,
135
+ "loss": 0.0724,
136
+ "step": 900
137
+ },
138
+ {
139
+ "epoch": 0.4754754754754755,
140
+ "grad_norm": 1.320494294166565,
141
+ "learning_rate": 1.68335001668335e-05,
142
+ "loss": 0.071,
143
+ "step": 950
144
+ },
145
+ {
146
+ "epoch": 0.5005005005005005,
147
+ "grad_norm": 1.2657185792922974,
148
+ "learning_rate": 1.6666666666666667e-05,
149
+ "loss": 0.0744,
150
+ "step": 1000
151
+ },
152
+ {
153
+ "epoch": 0.5255255255255256,
154
+ "grad_norm": 1.65080988407135,
155
+ "learning_rate": 1.6499833166499834e-05,
156
+ "loss": 0.0683,
157
+ "step": 1050
158
+ },
159
+ {
160
+ "epoch": 0.5505505505505506,
161
+ "grad_norm": 1.3511857986450195,
162
+ "learning_rate": 1.6332999666333e-05,
163
+ "loss": 0.0589,
164
+ "step": 1100
165
+ },
166
+ {
167
+ "epoch": 0.5755755755755756,
168
+ "grad_norm": 1.4793460369110107,
169
+ "learning_rate": 1.6166166166166168e-05,
170
+ "loss": 0.0621,
171
+ "step": 1150
172
+ },
173
+ {
174
+ "epoch": 0.6006006006006006,
175
+ "grad_norm": 1.1826051473617554,
176
+ "learning_rate": 1.5999332665999335e-05,
177
+ "loss": 0.0618,
178
+ "step": 1200
179
+ },
180
+ {
181
+ "epoch": 0.6256256256256256,
182
+ "grad_norm": 2.016144037246704,
183
+ "learning_rate": 1.58324991658325e-05,
184
+ "loss": 0.0609,
185
+ "step": 1250
186
+ },
187
+ {
188
+ "epoch": 0.6506506506506506,
189
+ "grad_norm": 1.3155397176742554,
190
+ "learning_rate": 1.566566566566567e-05,
191
+ "loss": 0.0562,
192
+ "step": 1300
193
+ },
194
+ {
195
+ "epoch": 0.6756756756756757,
196
+ "grad_norm": 1.2205989360809326,
197
+ "learning_rate": 1.5498832165498832e-05,
198
+ "loss": 0.0576,
199
+ "step": 1350
200
+ },
201
+ {
202
+ "epoch": 0.7007007007007007,
203
+ "grad_norm": 1.0111949443817139,
204
+ "learning_rate": 1.5331998665332e-05,
205
+ "loss": 0.0511,
206
+ "step": 1400
207
+ },
208
+ {
209
+ "epoch": 0.7257257257257257,
210
+ "grad_norm": 2.1711137294769287,
211
+ "learning_rate": 1.5165165165165166e-05,
212
+ "loss": 0.0519,
213
+ "step": 1450
214
+ },
215
+ {
216
+ "epoch": 0.7507507507507507,
217
+ "grad_norm": 1.7720271348953247,
218
+ "learning_rate": 1.4998331664998333e-05,
219
+ "loss": 0.0555,
220
+ "step": 1500
221
+ },
222
+ {
223
+ "epoch": 0.7757757757757757,
224
+ "grad_norm": 1.2108412981033325,
225
+ "learning_rate": 1.48314981648315e-05,
226
+ "loss": 0.0497,
227
+ "step": 1550
228
+ },
229
+ {
230
+ "epoch": 0.8008008008008008,
231
+ "grad_norm": 1.0561896562576294,
232
+ "learning_rate": 1.4664664664664665e-05,
233
+ "loss": 0.051,
234
+ "step": 1600
235
+ },
236
+ {
237
+ "epoch": 0.8258258258258259,
238
+ "grad_norm": 1.4415959119796753,
239
+ "learning_rate": 1.4497831164497834e-05,
240
+ "loss": 0.0557,
241
+ "step": 1650
242
+ },
243
+ {
244
+ "epoch": 0.8508508508508509,
245
+ "grad_norm": 1.4428836107254028,
246
+ "learning_rate": 1.4330997664330999e-05,
247
+ "loss": 0.0514,
248
+ "step": 1700
249
+ },
250
+ {
251
+ "epoch": 0.8758758758758759,
252
+ "grad_norm": 1.2458566427230835,
253
+ "learning_rate": 1.4164164164164164e-05,
254
+ "loss": 0.0503,
255
+ "step": 1750
256
+ },
257
+ {
258
+ "epoch": 0.9009009009009009,
259
+ "grad_norm": 1.9587106704711914,
260
+ "learning_rate": 1.3997330663997333e-05,
261
+ "loss": 0.0431,
262
+ "step": 1800
263
+ },
264
+ {
265
+ "epoch": 0.9259259259259259,
266
+ "grad_norm": 1.3225224018096924,
267
+ "learning_rate": 1.3830497163830498e-05,
268
+ "loss": 0.0469,
269
+ "step": 1850
270
+ },
271
+ {
272
+ "epoch": 0.950950950950951,
273
+ "grad_norm": 1.0749801397323608,
274
+ "learning_rate": 1.3663663663663665e-05,
275
+ "loss": 0.0437,
276
+ "step": 1900
277
+ },
278
+ {
279
+ "epoch": 0.975975975975976,
280
+ "grad_norm": 1.1957429647445679,
281
+ "learning_rate": 1.349683016349683e-05,
282
+ "loss": 0.0377,
283
+ "step": 1950
284
+ },
285
+ {
286
+ "epoch": 1.0,
287
+ "eval_EXPERIENCE_DURATION_f1": 0.9994408649556124,
288
+ "eval_EXPERIENCE_DURATION_precision": 0.9988826083516309,
289
+ "eval_EXPERIENCE_DURATION_recall": 1.0,
290
+ "eval_SKILL_f1": 0.9567528637214718,
291
+ "eval_SKILL_precision": 0.9501047362511714,
292
+ "eval_SKILL_recall": 0.9635754868567566,
293
+ "eval_f1": 0.9780968643385421,
294
+ "eval_loss": 0.036164652556180954,
295
+ "eval_precision": 0.9744936723014012,
296
+ "eval_recall": 0.9817877434283783,
297
+ "eval_runtime": 34.7404,
298
+ "eval_samples_per_second": 57.512,
299
+ "eval_steps_per_second": 7.196,
300
+ "step": 1998
301
+ },
302
+ {
303
+ "epoch": 1.001001001001001,
304
+ "grad_norm": 1.368384838104248,
305
+ "learning_rate": 1.3329996663329999e-05,
306
+ "loss": 0.0439,
307
+ "step": 2000
308
+ },
309
+ {
310
+ "epoch": 1.026026026026026,
311
+ "grad_norm": 0.9916768074035645,
312
+ "learning_rate": 1.3163163163163164e-05,
313
+ "loss": 0.0357,
314
+ "step": 2050
315
+ },
316
+ {
317
+ "epoch": 1.0510510510510511,
318
+ "grad_norm": 1.0130974054336548,
319
+ "learning_rate": 1.2996329662996329e-05,
320
+ "loss": 0.0374,
321
+ "step": 2100
322
+ },
323
+ {
324
+ "epoch": 1.0760760760760761,
325
+ "grad_norm": 0.9283238053321838,
326
+ "learning_rate": 1.2829496162829498e-05,
327
+ "loss": 0.0309,
328
+ "step": 2150
329
+ },
330
+ {
331
+ "epoch": 1.1011011011011012,
332
+ "grad_norm": 1.4641971588134766,
333
+ "learning_rate": 1.2662662662662663e-05,
334
+ "loss": 0.0376,
335
+ "step": 2200
336
+ },
337
+ {
338
+ "epoch": 1.1261261261261262,
339
+ "grad_norm": 1.2201365232467651,
340
+ "learning_rate": 1.249582916249583e-05,
341
+ "loss": 0.0345,
342
+ "step": 2250
343
+ },
344
+ {
345
+ "epoch": 1.1511511511511512,
346
+ "grad_norm": 0.949704110622406,
347
+ "learning_rate": 1.2328995662328997e-05,
348
+ "loss": 0.0344,
349
+ "step": 2300
350
+ },
351
+ {
352
+ "epoch": 1.1761761761761762,
353
+ "grad_norm": 1.91875422000885,
354
+ "learning_rate": 1.2162162162162164e-05,
355
+ "loss": 0.0319,
356
+ "step": 2350
357
+ },
358
+ {
359
+ "epoch": 1.2012012012012012,
360
+ "grad_norm": 0.7262997627258301,
361
+ "learning_rate": 1.1995328661995329e-05,
362
+ "loss": 0.0343,
363
+ "step": 2400
364
+ },
365
+ {
366
+ "epoch": 1.2262262262262262,
367
+ "grad_norm": 0.9259161353111267,
368
+ "learning_rate": 1.1828495161828497e-05,
369
+ "loss": 0.0382,
370
+ "step": 2450
371
+ },
372
+ {
373
+ "epoch": 1.2512512512512513,
374
+ "grad_norm": 1.3489047288894653,
375
+ "learning_rate": 1.1661661661661663e-05,
376
+ "loss": 0.0351,
377
+ "step": 2500
378
+ },
379
+ {
380
+ "epoch": 1.2762762762762763,
381
+ "grad_norm": 1.2273073196411133,
382
+ "learning_rate": 1.149482816149483e-05,
383
+ "loss": 0.0343,
384
+ "step": 2550
385
+ },
386
+ {
387
+ "epoch": 1.3013013013013013,
388
+ "grad_norm": 0.983469545841217,
389
+ "learning_rate": 1.1327994661327995e-05,
390
+ "loss": 0.0318,
391
+ "step": 2600
392
+ },
393
+ {
394
+ "epoch": 1.3263263263263263,
395
+ "grad_norm": 1.8853638172149658,
396
+ "learning_rate": 1.1161161161161163e-05,
397
+ "loss": 0.0283,
398
+ "step": 2650
399
+ },
400
+ {
401
+ "epoch": 1.3513513513513513,
402
+ "grad_norm": 0.7570764422416687,
403
+ "learning_rate": 1.0994327660994328e-05,
404
+ "loss": 0.031,
405
+ "step": 2700
406
+ },
407
+ {
408
+ "epoch": 1.3763763763763763,
409
+ "grad_norm": 1.3618675470352173,
410
+ "learning_rate": 1.0827494160827494e-05,
411
+ "loss": 0.0334,
412
+ "step": 2750
413
+ },
414
+ {
415
+ "epoch": 1.4014014014014013,
416
+ "grad_norm": 2.3121964931488037,
417
+ "learning_rate": 1.0660660660660662e-05,
418
+ "loss": 0.0307,
419
+ "step": 2800
420
+ },
421
+ {
422
+ "epoch": 1.4264264264264264,
423
+ "grad_norm": 2.3740978240966797,
424
+ "learning_rate": 1.0493827160493827e-05,
425
+ "loss": 0.0324,
426
+ "step": 2850
427
+ },
428
+ {
429
+ "epoch": 1.4514514514514514,
430
+ "grad_norm": 1.5923206806182861,
431
+ "learning_rate": 1.0326993660326994e-05,
432
+ "loss": 0.0297,
433
+ "step": 2900
434
+ },
435
+ {
436
+ "epoch": 1.4764764764764764,
437
+ "grad_norm": 1.6631975173950195,
438
+ "learning_rate": 1.0160160160160161e-05,
439
+ "loss": 0.0295,
440
+ "step": 2950
441
+ },
442
+ {
443
+ "epoch": 1.5015015015015014,
444
+ "grad_norm": 1.2054911851882935,
445
+ "learning_rate": 9.993326659993328e-06,
446
+ "loss": 0.0308,
447
+ "step": 3000
448
+ },
449
+ {
450
+ "epoch": 1.5265265265265264,
451
+ "grad_norm": 1.1273478269577026,
452
+ "learning_rate": 9.826493159826493e-06,
453
+ "loss": 0.027,
454
+ "step": 3050
455
+ },
456
+ {
457
+ "epoch": 1.5515515515515514,
458
+ "grad_norm": 1.272194504737854,
459
+ "learning_rate": 9.65965965965966e-06,
460
+ "loss": 0.029,
461
+ "step": 3100
462
+ },
463
+ {
464
+ "epoch": 1.5765765765765765,
465
+ "grad_norm": 0.5842704176902771,
466
+ "learning_rate": 9.492826159492827e-06,
467
+ "loss": 0.0302,
468
+ "step": 3150
469
+ },
470
+ {
471
+ "epoch": 1.6016016016016015,
472
+ "grad_norm": 1.4535027742385864,
473
+ "learning_rate": 9.325992659325992e-06,
474
+ "loss": 0.028,
475
+ "step": 3200
476
+ },
477
+ {
478
+ "epoch": 1.6266266266266265,
479
+ "grad_norm": 1.3739656209945679,
480
+ "learning_rate": 9.15915915915916e-06,
481
+ "loss": 0.0277,
482
+ "step": 3250
483
+ },
484
+ {
485
+ "epoch": 1.6516516516516515,
486
+ "grad_norm": 1.2092796564102173,
487
+ "learning_rate": 8.992325658992326e-06,
488
+ "loss": 0.0261,
489
+ "step": 3300
490
+ },
491
+ {
492
+ "epoch": 1.6766766766766765,
493
+ "grad_norm": 1.0393718481063843,
494
+ "learning_rate": 8.825492158825493e-06,
495
+ "loss": 0.0292,
496
+ "step": 3350
497
+ },
498
+ {
499
+ "epoch": 1.7017017017017015,
500
+ "grad_norm": 1.693858027458191,
501
+ "learning_rate": 8.65865865865866e-06,
502
+ "loss": 0.0263,
503
+ "step": 3400
504
+ },
505
+ {
506
+ "epoch": 1.7267267267267268,
507
+ "grad_norm": 0.9787161350250244,
508
+ "learning_rate": 8.491825158491825e-06,
509
+ "loss": 0.0275,
510
+ "step": 3450
511
+ },
512
+ {
513
+ "epoch": 1.7517517517517518,
514
+ "grad_norm": 0.773116409778595,
515
+ "learning_rate": 8.324991658324992e-06,
516
+ "loss": 0.0275,
517
+ "step": 3500
518
+ },
519
+ {
520
+ "epoch": 1.7767767767767768,
521
+ "grad_norm": 1.8271141052246094,
522
+ "learning_rate": 8.158158158158159e-06,
523
+ "loss": 0.0277,
524
+ "step": 3550
525
+ },
526
+ {
527
+ "epoch": 1.8018018018018018,
528
+ "grad_norm": 1.5024611949920654,
529
+ "learning_rate": 7.991324657991326e-06,
530
+ "loss": 0.0221,
531
+ "step": 3600
532
+ },
533
+ {
534
+ "epoch": 1.8268268268268268,
535
+ "grad_norm": 0.6469800472259521,
536
+ "learning_rate": 7.824491157824493e-06,
537
+ "loss": 0.0273,
538
+ "step": 3650
539
+ },
540
+ {
541
+ "epoch": 1.8518518518518519,
542
+ "grad_norm": 0.947632372379303,
543
+ "learning_rate": 7.657657657657658e-06,
544
+ "loss": 0.0269,
545
+ "step": 3700
546
+ },
547
+ {
548
+ "epoch": 1.8768768768768769,
549
+ "grad_norm": 1.2095396518707275,
550
+ "learning_rate": 7.490824157490825e-06,
551
+ "loss": 0.0275,
552
+ "step": 3750
553
+ },
554
+ {
555
+ "epoch": 1.901901901901902,
556
+ "grad_norm": 1.2205108404159546,
557
+ "learning_rate": 7.323990657323992e-06,
558
+ "loss": 0.0259,
559
+ "step": 3800
560
+ },
561
+ {
562
+ "epoch": 1.926926926926927,
563
+ "grad_norm": 1.218360185623169,
564
+ "learning_rate": 7.157157157157158e-06,
565
+ "loss": 0.0239,
566
+ "step": 3850
567
+ },
568
+ {
569
+ "epoch": 1.951951951951952,
570
+ "grad_norm": 0.6379629969596863,
571
+ "learning_rate": 6.990323656990325e-06,
572
+ "loss": 0.0259,
573
+ "step": 3900
574
+ },
575
+ {
576
+ "epoch": 1.976976976976977,
577
+ "grad_norm": 0.8407019376754761,
578
+ "learning_rate": 6.823490156823492e-06,
579
+ "loss": 0.0226,
580
+ "step": 3950
581
+ },
582
+ {
583
+ "epoch": 2.0,
584
+ "eval_EXPERIENCE_DURATION_f1": 0.9996445707422472,
585
+ "eval_EXPERIENCE_DURATION_precision": 0.9992893990664562,
586
+ "eval_EXPERIENCE_DURATION_recall": 1.0,
587
+ "eval_SKILL_f1": 0.9748727149746856,
588
+ "eval_SKILL_precision": 0.9721327735633751,
589
+ "eval_SKILL_recall": 0.9776506023350151,
590
+ "eval_f1": 0.9872586428584664,
591
+ "eval_loss": 0.02220688760280609,
592
+ "eval_precision": 0.9857110863149156,
593
+ "eval_recall": 0.9888253011675074,
594
+ "eval_runtime": 35.0196,
595
+ "eval_samples_per_second": 57.054,
596
+ "eval_steps_per_second": 7.139,
597
+ "step": 3996
598
+ },
599
+ {
600
+ "epoch": 2.002002002002002,
601
+ "grad_norm": 0.6077672839164734,
602
+ "learning_rate": 6.656656656656657e-06,
603
+ "loss": 0.0285,
604
+ "step": 4000
605
+ },
606
+ {
607
+ "epoch": 2.027027027027027,
608
+ "grad_norm": 1.7253185510635376,
609
+ "learning_rate": 6.489823156489824e-06,
610
+ "loss": 0.0223,
611
+ "step": 4050
612
+ },
613
+ {
614
+ "epoch": 2.052052052052052,
615
+ "grad_norm": 1.0563504695892334,
616
+ "learning_rate": 6.32298965632299e-06,
617
+ "loss": 0.0197,
618
+ "step": 4100
619
+ },
620
+ {
621
+ "epoch": 2.0770770770770772,
622
+ "grad_norm": 1.6818684339523315,
623
+ "learning_rate": 6.156156156156157e-06,
624
+ "loss": 0.0213,
625
+ "step": 4150
626
+ },
627
+ {
628
+ "epoch": 2.1021021021021022,
629
+ "grad_norm": 0.8844659924507141,
630
+ "learning_rate": 5.989322655989324e-06,
631
+ "loss": 0.0224,
632
+ "step": 4200
633
+ },
634
+ {
635
+ "epoch": 2.1271271271271273,
636
+ "grad_norm": 1.297920823097229,
637
+ "learning_rate": 5.82248915582249e-06,
638
+ "loss": 0.0195,
639
+ "step": 4250
640
+ },
641
+ {
642
+ "epoch": 2.1521521521521523,
643
+ "grad_norm": 2.1245086193084717,
644
+ "learning_rate": 5.6556556556556565e-06,
645
+ "loss": 0.0223,
646
+ "step": 4300
647
+ },
648
+ {
649
+ "epoch": 2.1771771771771773,
650
+ "grad_norm": 0.6769827008247375,
651
+ "learning_rate": 5.488822155488822e-06,
652
+ "loss": 0.0187,
653
+ "step": 4350
654
+ },
655
+ {
656
+ "epoch": 2.2022022022022023,
657
+ "grad_norm": 1.6571110486984253,
658
+ "learning_rate": 5.321988655321989e-06,
659
+ "loss": 0.0234,
660
+ "step": 4400
661
+ },
662
+ {
663
+ "epoch": 2.2272272272272273,
664
+ "grad_norm": 0.9618363976478577,
665
+ "learning_rate": 5.155155155155156e-06,
666
+ "loss": 0.0199,
667
+ "step": 4450
668
+ },
669
+ {
670
+ "epoch": 2.2522522522522523,
671
+ "grad_norm": 0.6882569789886475,
672
+ "learning_rate": 4.9883216549883224e-06,
673
+ "loss": 0.0194,
674
+ "step": 4500
675
+ },
676
+ {
677
+ "epoch": 2.2772772772772774,
678
+ "grad_norm": 2.4633865356445312,
679
+ "learning_rate": 4.8214881548214885e-06,
680
+ "loss": 0.0218,
681
+ "step": 4550
682
+ },
683
+ {
684
+ "epoch": 2.3023023023023024,
685
+ "grad_norm": 0.6027513742446899,
686
+ "learning_rate": 4.654654654654655e-06,
687
+ "loss": 0.0202,
688
+ "step": 4600
689
+ },
690
+ {
691
+ "epoch": 2.3273273273273274,
692
+ "grad_norm": 1.2988030910491943,
693
+ "learning_rate": 4.4878211544878214e-06,
694
+ "loss": 0.0194,
695
+ "step": 4650
696
+ },
697
+ {
698
+ "epoch": 2.3523523523523524,
699
+ "grad_norm": 1.9332680702209473,
700
+ "learning_rate": 4.3209876543209875e-06,
701
+ "loss": 0.02,
702
+ "step": 4700
703
+ },
704
+ {
705
+ "epoch": 2.3773773773773774,
706
+ "grad_norm": 0.8047870397567749,
707
+ "learning_rate": 4.154154154154154e-06,
708
+ "loss": 0.0201,
709
+ "step": 4750
710
+ },
711
+ {
712
+ "epoch": 2.4024024024024024,
713
+ "grad_norm": 1.143822431564331,
714
+ "learning_rate": 3.987320653987321e-06,
715
+ "loss": 0.0199,
716
+ "step": 4800
717
+ },
718
+ {
719
+ "epoch": 2.4274274274274275,
720
+ "grad_norm": 0.8660027384757996,
721
+ "learning_rate": 3.820487153820487e-06,
722
+ "loss": 0.0216,
723
+ "step": 4850
724
+ },
725
+ {
726
+ "epoch": 2.4524524524524525,
727
+ "grad_norm": 0.5358948111534119,
728
+ "learning_rate": 3.653653653653654e-06,
729
+ "loss": 0.0209,
730
+ "step": 4900
731
+ },
732
+ {
733
+ "epoch": 2.4774774774774775,
734
+ "grad_norm": 1.0323843955993652,
735
+ "learning_rate": 3.4868201534868207e-06,
736
+ "loss": 0.0161,
737
+ "step": 4950
738
+ },
739
+ {
740
+ "epoch": 2.5025025025025025,
741
+ "grad_norm": 0.8093557953834534,
742
+ "learning_rate": 3.319986653319987e-06,
743
+ "loss": 0.0217,
744
+ "step": 5000
745
+ },
746
+ {
747
+ "epoch": 2.5275275275275275,
748
+ "grad_norm": 0.8132084608078003,
749
+ "learning_rate": 3.1531531531531532e-06,
750
+ "loss": 0.0186,
751
+ "step": 5050
752
+ },
753
+ {
754
+ "epoch": 2.5525525525525525,
755
+ "grad_norm": 0.887740433216095,
756
+ "learning_rate": 2.9863196529863197e-06,
757
+ "loss": 0.0206,
758
+ "step": 5100
759
+ },
760
+ {
761
+ "epoch": 2.5775775775775776,
762
+ "grad_norm": 1.2536566257476807,
763
+ "learning_rate": 2.819486152819486e-06,
764
+ "loss": 0.0183,
765
+ "step": 5150
766
+ },
767
+ {
768
+ "epoch": 2.6026026026026026,
769
+ "grad_norm": 0.8627704381942749,
770
+ "learning_rate": 2.652652652652653e-06,
771
+ "loss": 0.0191,
772
+ "step": 5200
773
+ },
774
+ {
775
+ "epoch": 2.6276276276276276,
776
+ "grad_norm": 1.3632875680923462,
777
+ "learning_rate": 2.4858191524858196e-06,
778
+ "loss": 0.0194,
779
+ "step": 5250
780
+ },
781
+ {
782
+ "epoch": 2.6526526526526526,
783
+ "grad_norm": 0.8193138241767883,
784
+ "learning_rate": 2.3189856523189856e-06,
785
+ "loss": 0.0204,
786
+ "step": 5300
787
+ },
788
+ {
789
+ "epoch": 2.6776776776776776,
790
+ "grad_norm": 0.5377821326255798,
791
+ "learning_rate": 2.1521521521521525e-06,
792
+ "loss": 0.0191,
793
+ "step": 5350
794
+ },
795
+ {
796
+ "epoch": 2.7027027027027026,
797
+ "grad_norm": 0.8216055035591125,
798
+ "learning_rate": 1.9853186519853186e-06,
799
+ "loss": 0.0193,
800
+ "step": 5400
801
+ },
802
+ {
803
+ "epoch": 2.7277277277277276,
804
+ "grad_norm": 0.41987425088882446,
805
+ "learning_rate": 1.8184851518184855e-06,
806
+ "loss": 0.0149,
807
+ "step": 5450
808
+ },
809
+ {
810
+ "epoch": 2.7527527527527527,
811
+ "grad_norm": 0.7220802903175354,
812
+ "learning_rate": 1.6516516516516517e-06,
813
+ "loss": 0.0188,
814
+ "step": 5500
815
+ },
816
+ {
817
+ "epoch": 2.7777777777777777,
818
+ "grad_norm": 2.121993064880371,
819
+ "learning_rate": 1.4848181514848184e-06,
820
+ "loss": 0.0193,
821
+ "step": 5550
822
+ },
823
+ {
824
+ "epoch": 2.8028028028028027,
825
+ "grad_norm": 1.1751028299331665,
826
+ "learning_rate": 1.3179846513179847e-06,
827
+ "loss": 0.0197,
828
+ "step": 5600
829
+ },
830
+ {
831
+ "epoch": 2.8278278278278277,
832
+ "grad_norm": 0.31620585918426514,
833
+ "learning_rate": 1.1511511511511512e-06,
834
+ "loss": 0.0185,
835
+ "step": 5650
836
+ },
837
+ {
838
+ "epoch": 2.8528528528528527,
839
+ "grad_norm": 1.4038184881210327,
840
+ "learning_rate": 9.843176509843178e-07,
841
+ "loss": 0.0201,
842
+ "step": 5700
843
+ },
844
+ {
845
+ "epoch": 2.8778778778778777,
846
+ "grad_norm": 1.8849061727523804,
847
+ "learning_rate": 8.174841508174842e-07,
848
+ "loss": 0.0174,
849
+ "step": 5750
850
+ },
851
+ {
852
+ "epoch": 2.9029029029029028,
853
+ "grad_norm": 1.4456825256347656,
854
+ "learning_rate": 6.506506506506508e-07,
855
+ "loss": 0.0181,
856
+ "step": 5800
857
+ },
858
+ {
859
+ "epoch": 2.9279279279279278,
860
+ "grad_norm": 0.7297641634941101,
861
+ "learning_rate": 4.838171504838172e-07,
862
+ "loss": 0.0189,
863
+ "step": 5850
864
+ },
865
+ {
866
+ "epoch": 2.952952952952953,
867
+ "grad_norm": 1.3718576431274414,
868
+ "learning_rate": 3.169836503169837e-07,
869
+ "loss": 0.0178,
870
+ "step": 5900
871
+ },
872
+ {
873
+ "epoch": 2.977977977977978,
874
+ "grad_norm": 0.762457013130188,
875
+ "learning_rate": 1.5015015015015016e-07,
876
+ "loss": 0.0178,
877
+ "step": 5950
878
+ },
879
+ {
880
+ "epoch": 3.0,
881
+ "eval_EXPERIENCE_DURATION_f1": 0.9996445707422472,
882
+ "eval_EXPERIENCE_DURATION_precision": 0.9992893990664562,
883
+ "eval_EXPERIENCE_DURATION_recall": 1.0,
884
+ "eval_SKILL_f1": 0.9785448049870876,
885
+ "eval_SKILL_precision": 0.9761853201630533,
886
+ "eval_SKILL_recall": 0.9809297949364796,
887
+ "eval_f1": 0.9890946878646674,
888
+ "eval_loss": 0.019348522648215294,
889
+ "eval_precision": 0.9877373596147547,
890
+ "eval_recall": 0.9904648974682398,
891
+ "eval_runtime": 35.6847,
892
+ "eval_samples_per_second": 55.99,
893
+ "eval_steps_per_second": 7.006,
894
+ "step": 5994
895
+ }
896
+ ],
897
+ "logging_steps": 50,
898
+ "max_steps": 5994,
899
+ "num_input_tokens_seen": 0,
900
+ "num_train_epochs": 3,
901
+ "save_steps": 500,
902
+ "stateful_callbacks": {
903
+ "TrainerControl": {
904
+ "args": {
905
+ "should_epoch_stop": false,
906
+ "should_evaluate": false,
907
+ "should_log": false,
908
+ "should_save": true,
909
+ "should_training_stop": true
910
+ },
911
+ "attributes": {}
912
+ }
913
+ },
914
+ "total_flos": 1.047153514964232e+16,
915
+ "train_batch_size": 8,
916
+ "trial_name": null,
917
+ "trial_params": null
918
+ }
checkpoint-5994/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49c6e54cca37c43eb26640eed75758105b2700bc6c513adcc761248adff1380f
3
+ size 5777
checkpoint-5994/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaForTokenClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "dtype": "float32",
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "id2label": {
14
+ "0": "O",
15
+ "1": "B-EXPERIENCE_DURATION",
16
+ "2": "I-EXPERIENCE_DURATION",
17
+ "3": "B-SKILL",
18
+ "4": "I-SKILL"
19
+ },
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 3072,
22
+ "label2id": {
23
+ "B-EXPERIENCE_DURATION": 1,
24
+ "B-SKILL": 3,
25
+ "I-EXPERIENCE_DURATION": 2,
26
+ "I-SKILL": 4,
27
+ "O": 0
28
+ },
29
+ "layer_norm_eps": 1e-05,
30
+ "max_position_embeddings": 514,
31
+ "model_type": "roberta",
32
+ "num_attention_heads": 12,
33
+ "num_hidden_layers": 12,
34
+ "pad_token_id": 1,
35
+ "position_embedding_type": "absolute",
36
+ "transformers_version": "4.57.1",
37
+ "type_vocab_size": 1,
38
+ "use_cache": true,
39
+ "vocab_size": 50265
40
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:509d0ec667aaea68af890e0f03fe55b16f5a278671d5c074c130c8f9558cad02
3
+ size 496259468
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
test_metrics.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "test_loss": 0.02206496149301529,
3
+ "test_precision": 0.9855069157852383,
4
+ "test_recall": 0.9891982048789573,
5
+ "test_f1": 0.9873462648969002,
6
+ "test_SKILL_precision": 0.974378359649466,
7
+ "test_SKILL_recall": 0.9787130658693776,
8
+ "test_SKILL_f1": 0.97653591590187,
9
+ "test_EXPERIENCE_DURATION_precision": 0.9966354719210107,
10
+ "test_EXPERIENCE_DURATION_recall": 0.9996833438885371,
11
+ "test_EXPERIENCE_DURATION_f1": 0.9981566138919303,
12
+ "test_runtime": 35.2236,
13
+ "test_samples_per_second": 56.723,
14
+ "test_steps_per_second": 7.098
15
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "mask_token": "<mask>",
52
+ "model_max_length": 512,
53
+ "pad_token": "<pad>",
54
+ "sep_token": "</s>",
55
+ "tokenizer_class": "RobertaTokenizer",
56
+ "trim_offsets": true,
57
+ "unk_token": "<unk>"
58
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49c6e54cca37c43eb26640eed75758105b2700bc6c513adcc761248adff1380f
3
+ size 5777
vocab.json ADDED
The diff for this file is too large to render. See raw diff