Update README.md
Browse files
README.md
CHANGED
|
@@ -26,27 +26,25 @@ Evaluation of the GPT model has shown that its annotation performance is
|
|
| 26 |
comparable to those of human annotators.
|
| 27 |
|
| 28 |
The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
|
| 29 |
-
0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and
|
| 30 |
-
(880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
|
| 31 |
-
|
| 32 |
-
An additional evaluation on smaller samples from Czech ParlaMint-CZ, Bulgarian ParlaMint-BG and Ukrainian ParlaMint-UA datasets shows
|
| 33 |
-
that the model achieves macro-F1 scores of 0.736, 0.75 and 0.805 on these three test datasets, respectively.
|
| 34 |
|
| 35 |
For end use scenarios, we recommend filtering out predictions based on the model's prediction confidence.
|
| 36 |
|
| 37 |
When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
|
| 38 |
|
| 39 |
-
With this approach, we annotate as Mix 8.
|
| 40 |
-
11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian test
|
| 41 |
|
| 42 |
|
| 43 |
Performance of the model on the remaining instances (all instances not annotated as "Mix"):
|
| 44 |
|
| 45 |
| | micro-F1 | macro-F1 | accuracy |
|
| 46 |
|:---|-----------:|-----------:|-----------:|
|
| 47 |
-
|
|
| 48 |
-
|
|
| 49 |
-
|
|
|
|
|
| 50 |
|
| 51 |
|
| 52 |
|
|
|
|
| 26 |
comparable to those of human annotators.
|
| 27 |
|
| 28 |
The fine-tuned ParlaCAP model achieves 0.723 in macro-F1 on an English test set,
|
| 29 |
+
0.686 in macro-F1 on a Croatian test set, 0.710 in macro-F1 on a Serbian test set and 0.646 in macro-F1 on a Bosnian test set
|
| 30 |
+
(app. 880 instances from ParlaMint-GB 4.1, ParlaMint-HR 4.1, ParlaMint-RS 4.1 and ParlaMint-BA 4.1, respectively, balanced by labels).
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
For end use scenarios, we recommend filtering out predictions based on the model's prediction confidence.
|
| 33 |
|
| 34 |
When the model was applied to the ParlaMint datasets, we annotated instances that were predicted with confidence below 0.60 as "Mix".
|
| 35 |
|
| 36 |
+
With this approach, we annotate as Mix 8.9% of instances in the English test set,
|
| 37 |
+
11.4% of instances in the Croatian test set, 11.1% of instances in the Serbian and Bosnian test sets.
|
| 38 |
|
| 39 |
|
| 40 |
Performance of the model on the remaining instances (all instances not annotated as "Mix"):
|
| 41 |
|
| 42 |
| | micro-F1 | macro-F1 | accuracy |
|
| 43 |
|:---|-----------:|-----------:|-----------:|
|
| 44 |
+
| en | 0.761 | 0.758 | 0.761 |
|
| 45 |
+
| sr | 0.749 | 0.743 | 0.749 |
|
| 46 |
+
| hr | 0.724 | 0.726 | 0.724 |
|
| 47 |
+
| bs | 0.686 | 0.680 | 0.686 |
|
| 48 |
|
| 49 |
|
| 50 |
|