latofat
/

uzpostagger-cyrillic-3

Token Classification

Generated from Trainer

Model card Files Files and versions

latofat commited on Feb 10, 2025

Commit

728554d

·

verified ·

1 Parent(s): dcafb61

Update README.md

Files changed (1) hide show

README.md +28 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
 # uzpostagger-cyrillic-3
-This model is a fine-tuned version of [coppercitylabs/uzbert-base-uncased](https://huggingface.co/coppercitylabs/uzbert-base-uncased) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.2715
 - Precision: 0.8763
@@ -68,3 +68,30 @@ The following hyperparameters were used during training:
 - Pytorch 2.2.0
 - Datasets 2.17.1
 - Tokenizers 0.13.3

 # uzpostagger-cyrillic-3
+This model is a fine-tuned version of [coppercitylabs/uzbert-base-uncased](https://huggingface.co/coppercitylabs/uzbert-base-uncased) on [uzbekpos](https://huggingface.co/datasets/latofat/uzbekpos) dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.2715
 - Precision: 0.8763
 - Pytorch 2.2.0
 - Datasets 2.17.1
 - Tokenizers 0.13.3
+## Citation Information
+```
+@inproceedings{bobojonova-etal-2025-bbpos,
+    title = "{BBPOS}: {BERT}-based Part-of-Speech Tagging for {U}zbek",
+    author = "Bobojonova, Latofat  and
+      Akhundjanova, Arofat  and
+      Ostheimer, Phil Sidney  and
+      Fellenz, Sophie",
+    editor = "Hettiarachchi, Hansi  and
+      Ranasinghe, Tharindu  and
+      Rayson, Paul  and
+      Mitkov, Ruslan  and
+      Gaber, Mohamed  and
+      Premasiri, Damith  and
+      Tan, Fiona Anting  and
+      Uyangodage, Lasitha",
+    booktitle = "Proceedings of the First Workshop on Language Models for Low-Resource Languages",
+    month = jan,
+    year = "2025",
+    address = "Abu Dhabi, United Arab Emirates",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2025.loreslm-1.23/",
+    pages = "287--293",
+    abstract = "This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91{\%} average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers."
+}
+```