Bukareszt commited on
Commit
d62a59f
·
verified ·
1 Parent(s): a631824

Initial push

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ confusion_matrix.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,199 +1,78 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
 
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
 
 
 
 
 
10
 
 
11
 
12
- ## Model Details
13
 
14
- ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
 
 
 
 
 
 
 
 
 
35
 
36
- ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
 
 
 
 
 
 
39
 
40
- ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
+ base_model: PKOBP/polish-roberta-8k
5
+ tags:
6
+ - generated_from_trainer
7
+ metrics:
8
+ - accuracy
9
+ - precision
10
+ - recall
11
+ - f1
12
+ model-index:
13
+ - name: mwik-classifier-xd
14
+ results: []
15
  ---
16
 
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
19
 
20
+ # mwik-classifier-xd
21
 
22
+ This model is a fine-tuned version of [PKOBP/polish-roberta-8k](https://huggingface.co/PKOBP/polish-roberta-8k) on the None dataset.
23
+ It achieves the following results on the evaluation set:
24
+ - Loss: 1.1007
25
+ - Accuracy: 0.7838
26
+ - Precision: 0.7630
27
+ - Recall: 0.7838
28
+ - F1: 0.7601
29
 
30
+ ## Model description
31
 
32
+ More information needed
33
 
34
+ ## Intended uses & limitations
35
 
36
+ More information needed
37
 
38
+ ## Training and evaluation data
39
 
40
+ More information needed
 
 
 
 
 
 
41
 
42
+ ## Training procedure
43
 
44
+ ### Training hyperparameters
45
 
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 1e-05
48
+ - train_batch_size: 24
49
+ - eval_batch_size: 48
50
+ - seed: 42
51
+ - gradient_accumulation_steps: 4
52
+ - total_train_batch_size: 96
53
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
54
+ - lr_scheduler_type: polynomial
55
+ - lr_scheduler_warmup_ratio: 0.06
56
+ - num_epochs: 8
57
+ - mixed_precision_training: Native AMP
58
 
59
+ ### Training results
60
 
61
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
62
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|
63
+ | No log | 1.0 | 40 | 2.7021 | 0.3856 | 0.2986 | 0.3856 | 0.2870 |
64
+ | 3.0953 | 2.0 | 80 | 1.9267 | 0.5958 | 0.5010 | 0.5958 | 0.5128 |
65
+ | 2.1153 | 3.0 | 120 | 1.5299 | 0.6978 | 0.6806 | 0.6978 | 0.6399 |
66
+ | 1.5767 | 4.0 | 160 | 1.3317 | 0.7376 | 0.7340 | 0.7376 | 0.7022 |
67
+ | 1.2985 | 5.0 | 200 | 1.2154 | 0.7674 | 0.7460 | 0.7674 | 0.7407 |
68
+ | 1.2985 | 6.0 | 240 | 1.1614 | 0.7749 | 0.7545 | 0.7749 | 0.7515 |
69
+ | 1.1262 | 7.0 | 280 | 1.1227 | 0.7799 | 0.7575 | 0.7799 | 0.7605 |
70
+ | 1.0373 | 8.0 | 320 | 1.1079 | 0.7786 | 0.7540 | 0.7786 | 0.7588 |
71
 
 
72
 
73
+ ### Framework versions
74
 
75
+ - Transformers 4.57.1
76
+ - Pytorch 2.8.0+cu126
77
+ - Datasets 4.0.0
78
+ - Tokenizers 0.22.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 8.0,
3
+ "eval_accuracy": 0.7838150289017342,
4
+ "eval_f1": 0.760134834840537,
5
+ "eval_loss": 1.1006990671157837,
6
+ "eval_precision": 0.7630015333272686,
7
+ "eval_recall": 0.7838150289017342,
8
+ "eval_runtime": 3.2334,
9
+ "eval_samples_per_second": 267.523,
10
+ "eval_steps_per_second": 5.876,
11
+ "total_flos": 2.6699656498043904e+16,
12
+ "train_loss": 1.6651537001132966,
13
+ "train_runtime": 443.5681,
14
+ "train_samples_per_second": 69.257,
15
+ "train_steps_per_second": 0.721
16
+ }
classification_report.txt ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ================================================================================
3
+
4
+ DETAILED CLASSIFICATION REPORT (Top-1)
5
+ ================================================================================
6
+ precision recall f1-score support
7
+
8
+ DIERZ_ST_HYD 1.0000 0.6364 0.7778 11
9
+ INFO_DW 0.0000 0.0000 0.0000 7
10
+ NEGOC_DESZCZ 0.9091 0.9091 0.9091 11
11
+ OP_SIEC_WK 0.5385 1.0000 0.7000 14
12
+ OP_UM 1.0000 0.8571 0.9231 7
13
+ POZ_SPR_WIND 0.6538 0.9444 0.7727 18
14
+ PRZE 1.0000 1.0000 1.0000 8
15
+ REKLAMACJA 0.8667 1.0000 0.9286 13
16
+ UM_PARTYCY 1.0000 0.1667 0.2857 6
17
+ WOD_OGR_PRZY 1.0000 0.8571 0.9231 7
18
+ ZG_ODCZ 0.8000 0.8571 0.8276 14
19
+ ZW_NADP 0.0000 0.0000 0.0000 5
20
+
21
+ accuracy 0.7769 121
22
+ macro avg 0.7307 0.6857 0.6706 121
23
+ weighted avg 0.7502 0.7769 0.7319 121
24
+
25
+ ================================================================================
26
+
27
+ DETAILED CLASSIFICATION REPORT (Top-1)
28
+ ================================================================================
29
+ precision recall f1-score support
30
+
31
+ BINFO 0.6957 0.9412 0.8000 17
32
+ DANE_ARCH 1.0000 0.1538 0.2667 13
33
+ DIERZ_ST_HYD 0.8056 0.8529 0.8286 34
34
+ INFO_DW 0.9333 0.6087 0.7368 23
35
+ INSP 0.9286 0.6500 0.7647 20
36
+ INTERW_AW_K 0.6000 0.9273 0.7286 55
37
+ INTERW_AW_W 0.6133 0.8070 0.6970 57
38
+ INTERW_ODTW 0.6000 0.8889 0.7164 27
39
+ INTERW_ZAP 0.8667 0.8667 0.8667 15
40
+ NEGOC_DESZCZ 0.7381 0.9394 0.8267 33
41
+ ODWOD_KS 1.0000 0.8333 0.9091 6
42
+ OP_PRZY_WK 0.0000 0.0000 0.0000 16
43
+ OP_SIEC_WK 0.5366 0.5116 0.5238 43
44
+ OP_UM 0.7000 0.9545 0.8077 22
45
+ POZYTYW 0.0000 0.0000 0.0000 17
46
+ POZ_SPR_WIND 0.7746 0.9016 0.8333 61
47
+ PRZE 0.7541 0.9020 0.8214 51
48
+ PYT 1.0000 0.2258 0.3684 31
49
+ REKLAMACJA 0.7798 0.8947 0.8333 95
50
+ ROW_EKSP 0.8750 0.5385 0.6667 13
51
+ SK 0.5882 0.5556 0.5714 36
52
+ UDOST_WN 0.9091 0.8333 0.8696 12
53
+ UM 0.0000 0.0000 0.0000 14
54
+ UM_PARTYCY 0.8750 1.0000 0.9333 21
55
+ UZN_SCIEKI 0.0000 0.0000 0.0000 7
56
+ UZ_SIEC_WK 1.0000 0.2308 0.3750 13
57
+ WAR_PRZY_SIE 0.0000 0.0000 0.0000 7
58
+ WAR_WK 0.4000 0.6000 0.4800 10
59
+ WAR_WKKD 0.5556 0.4167 0.4762 12
60
+ WOD_OGR_PRZY 0.9091 0.9524 0.9302 21
61
+ WPIN_SIEC 1.0000 0.8182 0.9000 11
62
+ WYM_PRZY_WK 0.0000 0.0000 0.0000 7
63
+ ZASW_KONC 0.8000 0.6667 0.7273 12
64
+ ZG_ODCZ 0.8511 0.9091 0.8791 44
65
+ ZM 0.7000 0.9655 0.8116 29
66
+ ZW_NADP 0.7059 0.8000 0.7500 15
67
+
68
+ accuracy 0.7272 920
69
+ macro avg 0.6526 0.6152 0.6028 920
70
+ weighted avg 0.6984 0.7272 0.6872 920
71
+
72
+ ================================================================================
73
+
74
+ DETAILED CLASSIFICATION REPORT (Top-1)
75
+ ================================================================================
76
+ precision recall f1-score support
77
+
78
+ BINFO 0.7368 0.8235 0.7778 17
79
+ DANE_ARCH 0.8571 0.4615 0.6000 13
80
+ DIERZ_ST_HYD 0.8889 0.9412 0.9143 34
81
+ INFO_DW 0.7826 0.7826 0.7826 23
82
+ INSP 1.0000 1.0000 1.0000 13
83
+ INTERW_AW_K 0.6184 0.8545 0.7176 55
84
+ INTERW_AW_W 0.7101 0.8596 0.7778 57
85
+ INTERW_ODTW 0.7143 0.9259 0.8065 27
86
+ INTERW_ZAP 0.8125 0.8667 0.8387 15
87
+ NEGOC_DESZCZ 0.9118 0.9394 0.9254 33
88
+ ODWOD_KS 0.8000 0.6667 0.7273 6
89
+ OP_PRZY_WK 0.0000 0.0000 0.0000 16
90
+ OP_SIEC_WK 0.6765 0.5349 0.5974 43
91
+ OP_UM 0.7692 0.9091 0.8333 22
92
+ POZYTYW 0.0000 0.0000 0.0000 17
93
+ POZ_SPR_WIND 0.9032 0.9180 0.9106 61
94
+ PRZE 0.7538 0.9608 0.8448 51
95
+ PYT 0.4286 0.2903 0.3462 31
96
+ REKLAMACJA 0.8469 0.9326 0.8877 89
97
+ ROW_EKSP 0.8182 0.6923 0.7500 13
98
+ SK 0.5517 0.5517 0.5517 29
99
+ UDOST_WN 0.9091 0.8333 0.8696 12
100
+ UM_PARTYCY 0.9130 1.0000 0.9545 21
101
+ UZ_SIEC_WK 0.8889 0.6154 0.7273 13
102
+ WAR_WK 0.5000 0.4000 0.4444 10
103
+ WAR_WKKD 0.5556 0.4167 0.4762 12
104
+ WOD_OGR_PRZY 0.9091 0.9524 0.9302 21
105
+ WPIN_SIEC 0.8750 0.6364 0.7368 11
106
+ ZASW_KONC 0.8889 0.6667 0.7619 12
107
+ ZG_ODCZ 0.9535 0.9318 0.9425 44
108
+ ZM 0.8667 0.8966 0.8814 29
109
+ ZW_NADP 0.9286 0.8667 0.8966 15
110
+
111
+ accuracy 0.7861 865
112
+ macro avg 0.7428 0.7227 0.7253 865
113
+ weighted avg 0.7564 0.7861 0.7649 865
114
+
115
+ ================================================================================
116
+
117
+ DETAILED CLASSIFICATION REPORT (Top-1)
118
+ ================================================================================
119
+ precision recall f1-score support
120
+
121
+ BINFO 0.8421 0.9412 0.8889 17
122
+ DANE_ARCH 1.0000 0.3846 0.5556 13
123
+ DIERZ_ST_HYD 0.9143 0.9412 0.9275 34
124
+ INFO_DW 0.8095 0.7391 0.7727 23
125
+ INSP 1.0000 1.0000 1.0000 13
126
+ INTERW_AW_K 0.6184 0.8545 0.7176 55
127
+ INTERW_AW_W 0.6618 0.7895 0.7200 57
128
+ INTERW_ODTW 0.5952 0.9259 0.7246 27
129
+ INTERW_ZAP 0.8333 0.6667 0.7407 15
130
+ NEGOC_DESZCZ 0.8857 0.9394 0.9118 33
131
+ ODWOD_KS 1.0000 0.8333 0.9091 6
132
+ OP_PRZY_WK 0.0000 0.0000 0.0000 16
133
+ OP_SIEC_WK 0.6667 0.5116 0.5789 43
134
+ OP_UM 0.8077 0.9545 0.8750 22
135
+ POZYTYW 0.0000 0.0000 0.0000 17
136
+ POZ_SPR_WIND 0.9048 0.9344 0.9194 61
137
+ PRZE 0.7656 0.9608 0.8522 51
138
+ PYT 0.4000 0.1935 0.2609 31
139
+ REKLAMACJA 0.8400 0.9438 0.8889 89
140
+ ROW_EKSP 0.6875 0.8462 0.7586 13
141
+ SK 0.5600 0.4828 0.5185 29
142
+ UDOST_WN 1.0000 0.8333 0.9091 12
143
+ UM_PARTYCY 0.8750 1.0000 0.9333 21
144
+ UZ_SIEC_WK 1.0000 0.6154 0.7619 13
145
+ WAR_WK 0.3889 0.7000 0.5000 10
146
+ WAR_WKKD 1.0000 0.2500 0.4000 12
147
+ WOD_OGR_PRZY 0.9500 0.9048 0.9268 21
148
+ WPIN_SIEC 0.9091 0.9091 0.9091 11
149
+ ZASW_KONC 1.0000 0.6667 0.8000 12
150
+ ZG_ODCZ 0.8913 0.9318 0.9111 44
151
+ ZM 0.9310 0.9310 0.9310 29
152
+ ZW_NADP 0.9333 0.9333 0.9333 15
153
+
154
+ accuracy 0.7838 865
155
+ macro avg 0.7710 0.7350 0.7324 865
156
+ weighted avg 0.7630 0.7838 0.7601 865
config.json CHANGED
@@ -23,80 +23,72 @@
23
  "id2label": {
24
  "0": "BINFO",
25
  "1": "DANE_ARCH",
26
- "2": "INFO_DW",
27
- "3": "INSP",
28
- "4": "INTERW_AW_K",
29
- "5": "INTERW_AW_W",
30
- "6": "INTERW_ODTW",
31
- "7": "INTERW_ZAP",
32
- "8": "NEGOC_DESZCZ",
33
- "9": "ODWOD_KS",
34
- "10": "OKR_WŁ_PRZEW",
35
- "11": "OP_SIEC_WK",
36
- "12": "OP_UM",
37
- "13": "POZYTYW",
38
- "14": "POZ_SPR_WIND",
39
- "15": "PRZE",
40
- "16": "PRZEK_SIEĆ",
41
- "17": "PRZEN_WOD",
42
- "18": "PYT",
43
- "19": "REKLAMACJA",
44
- "20": "ROW_EKSP",
45
- "21": "SK",
46
- "22": "UDOST_WN",
47
- "23": "UM",
48
- "24": "UM_PARTYCY",
49
- "25": "UZN_ŚCIEKI",
50
- "26": "UZ_SIEĆ_WK",
51
- "27": "WAR_PRZY_SIE",
52
- "28": "WAR_W+K",
53
- "29": "WAR_W+K+KD",
54
- "30": "WOD_OGR_PRZY",
55
- "31": "WYM_PRZYŁ_WK",
56
- "32": "ZAŚW_KOŃC",
57
- "33": "ZGŁ_ODCZ",
58
- "34": "ZM",
59
- "35": "ZW_NADPŁ"
60
  },
61
  "initializer_range": 0.02,
62
  "intermediate_size": 4096,
63
  "label2id": {
64
  "BINFO": 0,
65
  "DANE_ARCH": 1,
66
- "INFO_DW": 2,
67
- "INSP": 3,
68
- "INTERW_AW_K": 4,
69
- "INTERW_AW_W": 5,
70
- "INTERW_ODTW": 6,
71
- "INTERW_ZAP": 7,
72
- "NEGOC_DESZCZ": 8,
73
- "ODWOD_KS": 9,
74
- "OKR_WŁ_PRZEW": 10,
75
- "OP_SIEC_WK": 11,
76
- "OP_UM": 12,
77
- "POZYTYW": 13,
78
- "POZ_SPR_WIND": 14,
79
- "PRZE": 15,
80
- "PRZEK_SIEĆ": 16,
81
- "PRZEN_WOD": 17,
82
- "PYT": 18,
83
- "REKLAMACJA": 19,
84
- "ROW_EKSP": 20,
85
- "SK": 21,
86
- "UDOST_WN": 22,
87
- "UM": 23,
88
- "UM_PARTYCY": 24,
89
- "UZN_ŚCIEKI": 25,
90
- "UZ_SIEĆ_WK": 26,
91
- "WAR_PRZY_SIE": 27,
92
- "WAR_W+K": 28,
93
- "WAR_W+K+KD": 29,
94
- "WOD_OGR_PRZY": 30,
95
- "WYM_PRZYŁ_WK": 31,
96
- "ZAŚW_KOŃC": 32,
97
- "ZGŁ_ODCZ": 33,
98
- "ZM": 34,
99
- "ZW_NADPŁ": 35
100
  },
101
  "layer_norm_eps": 1e-05,
102
  "max_position_embeddings": 8194,
 
23
  "id2label": {
24
  "0": "BINFO",
25
  "1": "DANE_ARCH",
26
+ "2": "DIERZ_ST_HYD",
27
+ "3": "INFO_DW",
28
+ "4": "INSP",
29
+ "5": "INTERW_AW_K",
30
+ "6": "INTERW_AW_W",
31
+ "7": "INTERW_ODTW",
32
+ "8": "INTERW_ZAP",
33
+ "9": "NEGOC_DESZCZ",
34
+ "10": "ODWOD_KS",
35
+ "11": "OP_PRZY_WK",
36
+ "12": "OP_SIEC_WK",
37
+ "13": "OP_UM",
38
+ "14": "POZYTYW",
39
+ "15": "POZ_SPR_WIND",
40
+ "16": "PRZE",
41
+ "17": "PYT",
42
+ "18": "REKLAMACJA",
43
+ "19": "ROW_EKSP",
44
+ "20": "SK",
45
+ "21": "UDOST_WN",
46
+ "22": "UM_PARTYCY",
47
+ "23": "UZ_SIEC_WK",
48
+ "24": "WAR_WK",
49
+ "25": "WAR_WKKD",
50
+ "26": "WOD_OGR_PRZY",
51
+ "27": "WPIN_SIEC",
52
+ "28": "ZASW_KONC",
53
+ "29": "ZG_ODCZ",
54
+ "30": "ZM",
55
+ "31": "ZW_NADP"
 
 
 
 
56
  },
57
  "initializer_range": 0.02,
58
  "intermediate_size": 4096,
59
  "label2id": {
60
  "BINFO": 0,
61
  "DANE_ARCH": 1,
62
+ "DIERZ_ST_HYD": 2,
63
+ "INFO_DW": 3,
64
+ "INSP": 4,
65
+ "INTERW_AW_K": 5,
66
+ "INTERW_AW_W": 6,
67
+ "INTERW_ODTW": 7,
68
+ "INTERW_ZAP": 8,
69
+ "NEGOC_DESZCZ": 9,
70
+ "ODWOD_KS": 10,
71
+ "OP_PRZY_WK": 11,
72
+ "OP_SIEC_WK": 12,
73
+ "OP_UM": 13,
74
+ "POZYTYW": 14,
75
+ "POZ_SPR_WIND": 15,
76
+ "PRZE": 16,
77
+ "PYT": 17,
78
+ "REKLAMACJA": 18,
79
+ "ROW_EKSP": 19,
80
+ "SK": 20,
81
+ "UDOST_WN": 21,
82
+ "UM_PARTYCY": 22,
83
+ "UZ_SIEC_WK": 23,
84
+ "WAR_WK": 24,
85
+ "WAR_WKKD": 25,
86
+ "WOD_OGR_PRZY": 26,
87
+ "WPIN_SIEC": 27,
88
+ "ZASW_KONC": 28,
89
+ "ZG_ODCZ": 29,
90
+ "ZM": 30,
91
+ "ZW_NADP": 31
 
 
 
 
92
  },
93
  "layer_norm_eps": 1e-05,
94
  "max_position_embeddings": 8194,
configuration_roberta.py ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """ RoBERTa configuration"""
17
+ from collections import OrderedDict
18
+ from typing import Mapping
19
+
20
+ from transformers import PretrainedConfig
21
+ from transformers.onnx import OnnxConfig
22
+ from transformers.utils import logging
23
+
24
+
25
+ logger = logging.get_logger(__name__)
26
+
27
+
28
+ class RobertaConfig(PretrainedConfig):
29
+ r"""
30
+ This is the configuration class to store the configuration of a [`RobertaModel`] or a [`TFRobertaModel`]. It is
31
+ used to instantiate a RoBERTa model according to the specified arguments, defining the model architecture.
32
+ Instantiating a configuration with the defaults will yield a similar configuration to that of the RoBERTa
33
+ [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) architecture.
34
+
35
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
36
+ documentation from [`PretrainedConfig`] for more information.
37
+
38
+
39
+ Args:
40
+ vocab_size (`int`, *optional*, defaults to 50265):
41
+ Vocabulary size of the RoBERTa model. Defines the number of different tokens that can be represented by the
42
+ `inputs_ids` passed when calling [`RobertaModel`] or [`TFRobertaModel`].
43
+ hidden_size (`int`, *optional*, defaults to 768):
44
+ Dimensionality of the encoder layers and the pooler layer.
45
+ num_hidden_layers (`int`, *optional*, defaults to 12):
46
+ Number of hidden layers in the Transformer encoder.
47
+ num_attention_heads (`int`, *optional*, defaults to 12):
48
+ Number of attention heads for each attention layer in the Transformer encoder.
49
+ intermediate_size (`int`, *optional*, defaults to 3072):
50
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
51
+ hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
52
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
53
+ `"relu"`, `"silu"` and `"gelu_new"` are supported.
54
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
55
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
56
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
57
+ The dropout ratio for the attention probabilities.
58
+ max_position_embeddings (`int`, *optional*, defaults to 512):
59
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
60
+ just in case (e.g., 512 or 1024 or 2048).
61
+ type_vocab_size (`int`, *optional*, defaults to 2):
62
+ The vocabulary size of the `token_type_ids` passed when calling [`RobertaModel`] or [`TFRobertaModel`].
63
+ initializer_range (`float`, *optional*, defaults to 0.02):
64
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
65
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
66
+ The epsilon used by the layer normalization layers.
67
+ position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
68
+ Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
69
+ positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
70
+ [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
71
+ For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
72
+ with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
73
+ is_decoder (`bool`, *optional*, defaults to `False`):
74
+ Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
75
+ use_cache (`bool`, *optional*, defaults to `True`):
76
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
77
+ relevant if `config.is_decoder=True`.
78
+ classifier_dropout (`float`, *optional*):
79
+ The dropout ratio for the classification head.
80
+
81
+ Examples:
82
+
83
+ ```python
84
+ >>> from transformers import RobertaConfig, RobertaModel
85
+
86
+ >>> # Initializing a RoBERTa configuration
87
+ >>> configuration = RobertaConfig()
88
+
89
+ >>> # Initializing a model (with random weights) from the configuration
90
+ >>> model = RobertaModel(configuration)
91
+
92
+ >>> # Accessing the model configuration
93
+ >>> configuration = model.config
94
+ ```"""
95
+
96
+ model_type = "roberta"
97
+
98
+ def __init__(
99
+ self,
100
+ vocab_size=50265,
101
+ hidden_size=768,
102
+ num_hidden_layers=12,
103
+ num_attention_heads=12,
104
+ intermediate_size=3072,
105
+ hidden_act="gelu",
106
+ hidden_dropout_prob=0.1,
107
+ attention_probs_dropout_prob=0.1,
108
+ max_position_embeddings=512,
109
+ type_vocab_size=2,
110
+ initializer_range=0.02,
111
+ layer_norm_eps=1e-12,
112
+ pad_token_id=1,
113
+ bos_token_id=0,
114
+ eos_token_id=2,
115
+ position_embedding_type="absolute",
116
+ use_cache=True,
117
+ classifier_dropout=None,
118
+ **kwargs,
119
+ ):
120
+ super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
121
+
122
+ self.vocab_size = vocab_size
123
+ self.hidden_size = hidden_size
124
+ self.num_hidden_layers = num_hidden_layers
125
+ self.num_attention_heads = num_attention_heads
126
+ self.hidden_act = hidden_act
127
+ self.intermediate_size = intermediate_size
128
+ self.hidden_dropout_prob = hidden_dropout_prob
129
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
130
+ self.max_position_embeddings = max_position_embeddings
131
+ self.type_vocab_size = type_vocab_size
132
+ self.initializer_range = initializer_range
133
+ self.layer_norm_eps = layer_norm_eps
134
+ self.position_embedding_type = position_embedding_type
135
+ self.use_cache = use_cache
136
+ self.classifier_dropout = classifier_dropout
137
+
138
+
139
+ class RobertaOnnxConfig(OnnxConfig):
140
+ @property
141
+ def inputs(self) -> Mapping[str, Mapping[int, str]]:
142
+ if self.task == "multiple-choice":
143
+ dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
144
+ else:
145
+ dynamic_axis = {0: "batch", 1: "sequence"}
146
+ return OrderedDict(
147
+ [
148
+ ("input_ids", dynamic_axis),
149
+ ("attention_mask", dynamic_axis),
150
+ ]
151
+ )
confusion_matrix.png ADDED

Git LFS Details

  • SHA256: 36f81342a3a04b624b7617fd94926436b4d06efd26eeddbabbd6d2df17b9183b
  • Pointer size: 131 Bytes
  • Size of remote file: 385 kB
label_info.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "label2id": {
3
+ "BINFO": 0,
4
+ "DANE_ARCH": 1,
5
+ "DIERZ_ST_HYD": 2,
6
+ "INFO_DW": 3,
7
+ "INSP": 4,
8
+ "INTERW_AW_K": 5,
9
+ "INTERW_AW_W": 6,
10
+ "INTERW_ODTW": 7,
11
+ "INTERW_ZAP": 8,
12
+ "NEGOC_DESZCZ": 9,
13
+ "ODWOD_KS": 10,
14
+ "OP_PRZY_WK": 11,
15
+ "OP_SIEC_WK": 12,
16
+ "OP_UM": 13,
17
+ "POZYTYW": 14,
18
+ "POZ_SPR_WIND": 15,
19
+ "PRZE": 16,
20
+ "PYT": 17,
21
+ "REKLAMACJA": 18,
22
+ "ROW_EKSP": 19,
23
+ "SK": 20,
24
+ "UDOST_WN": 21,
25
+ "UM_PARTYCY": 22,
26
+ "UZ_SIEC_WK": 23,
27
+ "WAR_WK": 24,
28
+ "WAR_WKKD": 25,
29
+ "WOD_OGR_PRZY": 26,
30
+ "WPIN_SIEC": 27,
31
+ "ZASW_KONC": 28,
32
+ "ZG_ODCZ": 29,
33
+ "ZM": 30,
34
+ "ZW_NADP": 31
35
+ },
36
+ "id2label": {
37
+ "0": "BINFO",
38
+ "1": "DANE_ARCH",
39
+ "2": "DIERZ_ST_HYD",
40
+ "3": "INFO_DW",
41
+ "4": "INSP",
42
+ "5": "INTERW_AW_K",
43
+ "6": "INTERW_AW_W",
44
+ "7": "INTERW_ODTW",
45
+ "8": "INTERW_ZAP",
46
+ "9": "NEGOC_DESZCZ",
47
+ "10": "ODWOD_KS",
48
+ "11": "OP_PRZY_WK",
49
+ "12": "OP_SIEC_WK",
50
+ "13": "OP_UM",
51
+ "14": "POZYTYW",
52
+ "15": "POZ_SPR_WIND",
53
+ "16": "PRZE",
54
+ "17": "PYT",
55
+ "18": "REKLAMACJA",
56
+ "19": "ROW_EKSP",
57
+ "20": "SK",
58
+ "21": "UDOST_WN",
59
+ "22": "UM_PARTYCY",
60
+ "23": "UZ_SIEC_WK",
61
+ "24": "WAR_WK",
62
+ "25": "WAR_WKKD",
63
+ "26": "WOD_OGR_PRZY",
64
+ "27": "WPIN_SIEC",
65
+ "28": "ZASW_KONC",
66
+ "29": "ZG_ODCZ",
67
+ "30": "ZM",
68
+ "31": "ZW_NADP"
69
+ },
70
+ "num_labels": 32
71
+ }
logs/events.out.tfevents.1760994777.a5b7e37e7852.6366.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4dd3ea45a35343426d035ae1baad1b59a01961d0b26d24e58528cf54a6eef493
3
+ size 11381
logs/events.out.tfevents.1760994974.a5b7e37e7852.6366.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bece3587ff7bfcfc5aa0d72203679228ae56a13e5d48d1155c164d7b756537f2
3
+ size 551
logs/events.out.tfevents.1760995186.a5b7e37e7852.6366.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e40331f5bf5d23a204a6aa240dce982db039918788ac436fb9d944fbb36044b
3
+ size 12208
logs/events.out.tfevents.1760995679.a5b7e37e7852.6366.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc6c58a8ce7d749156302bab297e681aa3ae4b5d8850d3ac41cbe57b1cea1dea
3
+ size 4184
logs/events.out.tfevents.1760995793.a5b7e37e7852.20606.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a67e939f7e6f36138f7944886531c2009e73a4b3007e24974ebd40a39ca30bb
3
+ size 8324
logs/events.out.tfevents.1760995936.a5b7e37e7852.21329.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:655b571215a50a6c23d29456b7c43e5c55eb7828355e1dac6b3ca912d0de1e9a
3
+ size 11406
logs/events.out.tfevents.1760996283.a5b7e37e7852.21329.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37317c2622f3532fea3fd026c622bce0013406d92306354d932b52fa6d5ccd98
3
+ size 560
logs/events.out.tfevents.1760996482.a5b7e37e7852.21329.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e122ba20aba7bc7a35a48f3edc91f2468f0980b83fbfd46b5f233c4e675cdd4
3
+ size 12378
logs/events.out.tfevents.1760996930.a5b7e37e7852.21329.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01d1e94efab1dc7959ed8eaba516302225bfd6b7a7b3d996c36124c39bbab3f4
3
+ size 560
logs/events.out.tfevents.1761000162.a5b7e37e7852.21329.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89222f87ad44c65ad4e1d534bd8a8a3aa655471d0b2a047a000ca29d5acf4ea2
3
+ size 4184
logs/events.out.tfevents.1761000278.a5b7e37e7852.39741.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89f68038a0b7ed689e090ffef495d6a5e7dbcafed5ce19161dfaa37213e85462
3
+ size 12378
logs/events.out.tfevents.1761000725.a5b7e37e7852.39741.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5dbbbd28fedb8a3595d0fe7b21c31e3e3c106dcddff644cd6a4831ffcda0add
3
+ size 560
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:76be2eae12e9d636194d8cd326570e32f11ba555b8b2d91fce90738b0666ee2a
3
- size 1771757024
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62db5e122f9f41163da858efd8df098e45b4837daa5e352df66f01195be5b08e
3
+ size 1771740624
test_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 8.0,
3
+ "eval_accuracy": 0.7838150289017342,
4
+ "eval_f1": 0.760134834840537,
5
+ "eval_loss": 1.1006990671157837,
6
+ "eval_precision": 0.7630015333272686,
7
+ "eval_recall": 0.7838150289017342,
8
+ "eval_runtime": 3.2334,
9
+ "eval_samples_per_second": 267.523,
10
+ "eval_steps_per_second": 5.876
11
+ }
tokenizer.json CHANGED
@@ -1,11 +1,6 @@
1
  {
2
  "version": "1.0",
3
- "truncation": {
4
- "direction": "Right",
5
- "max_length": 1024,
6
- "strategy": "LongestFirst",
7
- "stride": 0
8
- },
9
  "padding": null,
10
  "added_tokens": [
11
  {
 
1
  {
2
  "version": "1.0",
3
+ "truncation": null,
 
 
 
 
 
4
  "padding": null,
5
  "added_tokens": [
6
  {
tokenizer_config.json CHANGED
@@ -553,14 +553,10 @@
553
  "errors": "replace",
554
  "extra_special_tokens": {},
555
  "mask_token": "<mask>",
556
- "max_length": 1024,
557
  "model_max_length": 1000000000000000019884624838656,
558
  "pad_token": "<pad>",
559
  "sep_token": "</s>",
560
- "stride": 0,
561
  "tokenizer_class": "RobertaTokenizer",
562
  "trim_offsets": true,
563
- "truncation_side": "right",
564
- "truncation_strategy": "longest_first",
565
  "unk_token": "<unk>"
566
  }
 
553
  "errors": "replace",
554
  "extra_special_tokens": {},
555
  "mask_token": "<mask>",
 
556
  "model_max_length": 1000000000000000019884624838656,
557
  "pad_token": "<pad>",
558
  "sep_token": "</s>",
 
559
  "tokenizer_class": "RobertaTokenizer",
560
  "trim_offsets": true,
 
 
561
  "unk_token": "<unk>"
562
  }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 8.0,
3
+ "total_flos": 2.6699656498043904e+16,
4
+ "train_loss": 1.6651537001132966,
5
+ "train_runtime": 443.5681,
6
+ "train_samples_per_second": 69.257,
7
+ "train_steps_per_second": 0.721
8
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f7548759bf7384197093ccf91ca07874e132e26bdb5f51419e2b0b1c407d859
3
+ size 5905