Token Classification
Safetensors
Tatar
bert
tatar
morphology

Turkish BERT fine-tuned for Tatar Morphological Analysis

This model is a fine-tuned version of dbmdz/bert-base-turkish-cased for morphological analysis of the Tatar language. It was trained on a subset of 80,000 sentences from the Tatar Morphological Corpus. The model predicts fine-grained morphological tags (e.g., N+Sg+Nom, V+PRES(Й)+3SG).

Performance on Test Set

Metric Value 95% CI
Token Accuracy 0.8769 [0.8742, 0.8795]
Micro F1 0.8770 [0.8744, 0.8797]
Macro F1 0.4098 [0.3945, 0.4254]

Accuracy by Part of Speech (Top 10)

POS Accuracy
PUNCT 1.0000
NOUN 0.8297
VERB 0.7904
ADJ 0.7770
PRON 0.7488
PART 0.9438
PROPN 0.7899
ADP 0.8400
CCONJ 0.9628
ADV 0.8307

Usage

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "TatarNLPWorld/turkish-bert-tatar-morph"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

tokens = ["Татар", "теле", "бик", "бай", "."]
inputs = tokenizer(tokens, is_split_into_words=True, return_tensors="pt", truncation=True)
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)

# Get tag mapping from model config
id2tag = model.config.id2label

word_ids = inputs.word_ids()
prev_word = None
for idx, word_idx in enumerate(word_ids):
    if word_idx is not None and word_idx != prev_word:
        tag_id = predictions[0][idx].item()
        if isinstance(id2tag, dict):
            tag = id2tag.get(str(tag_id), id2tag.get(tag_id, "UNK"))
        else:
            tag = id2tag[tag_id] if tag_id < len(id2tag) else "UNK"
        print(tokens[word_idx], "->", tag)
    prev_word = word_idx

Expected output (approximately):

Татар -> N+Sg+Nom
теле -> N+Sg+POSS_3(СЫ)+Nom
бик -> Adv
бай -> Adj
. -> PUNCT

Citation

If you use this model, please cite it as:

@misc{arabov-turkish-bert-tatar-morph-2026,
  title = {Turkish BERT fine-tuned for Tatar Morphological Analysis},
  author = {Arabov Mullosharaf Kurbonovich},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/TatarNLPWorld/turkish-bert-tatar-morph}
}

License

Apache 2.0

Downloads last month
39
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support