metadata
license: apache-2.0
base_model: tahrirchi/tahrirchi-bert-base
tags:
- generated_from_trainer
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: uzpostagger-latin-3
results: []
uzpostagger-latin-3
This model is a fine-tuned version of tahrirchi/tahrirchi-bert-base on uzbekpos dataset. It achieves the following results on the evaluation set:
- Loss: 0.3122
- Precision: 0.8858
- Recall: 0.8625
- F1: 0.8740
- Accuracy: 0.9227
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|---|
| No log | 1.0 | 25 | 1.0019 | 0.5689 | 0.3641 | 0.4440 | 0.6773 |
| No log | 2.0 | 50 | 0.5215 | 0.7924 | 0.7209 | 0.7549 | 0.8575 |
| No log | 3.0 | 75 | 0.3725 | 0.8629 | 0.8321 | 0.8472 | 0.9073 |
| No log | 4.0 | 100 | 0.3285 | 0.8751 | 0.8458 | 0.8602 | 0.9150 |
| No log | 5.0 | 125 | 0.3122 | 0.8858 | 0.8625 | 0.8740 | 0.9227 |
Framework versions
- Transformers 4.32.1
- Pytorch 2.2.0
- Datasets 2.17.1
- Tokenizers 0.13.3
Citation Information
@inproceedings{bobojonova-etal-2025-bbpos,
title = "{BBPOS}: {BERT}-based Part-of-Speech Tagging for {U}zbek",
author = "Bobojonova, Latofat and
Akhundjanova, Arofat and
Ostheimer, Phil Sidney and
Fellenz, Sophie",
editor = "Hettiarachchi, Hansi and
Ranasinghe, Tharindu and
Rayson, Paul and
Mitkov, Ruslan and
Gaber, Mohamed and
Premasiri, Damith and
Tan, Fiona Anting and
Uyangodage, Lasitha",
booktitle = "Proceedings of the First Workshop on Language Models for Low-Resource Languages",
month = jan,
year = "2025",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.loreslm-1.23/",
pages = "287--293",
abstract = "This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91{\%} average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers."
}