Towards Robust Uzbek Neural Dependency Parsing β€” Model Weights

This repository hosts trained model checkpoints from the paper
"Towards Robust Uzbek Neural Dependency Parsing" (Matlatipov, 2026).

The models are Stanza-style neural pipelines for Uzbek morphosyntactic tagging (UPOS/XPOS/UFeats) and UD dependency parsing (UAS/LAS), comparing a static FastText baseline against TahrirchiBERT contextual embeddings across two Uzbek UD treebanks.

Source code & training scripts: https://github.com/Sanatbek/robust-parsing-uzbek


Model Files

Tokenizer

File Description
saved_models/tokenize/uz_uzudt_tokenizer.pt Uzbek tokenizer trained on UzUDT

spaCy Pipelines (saved_models/spacy/)

These are full spaCy pipeline models (directory format) trained with TahrirchiBERT (tahrirchi/tahrirchi-bert-base). Each pipeline jointly performs UPOS tagging, morphological analysis, and dependency parsing.

Directory Experiment Data Embeddings
saved_models/spacy/transformer_uzudt/model-best/ S1.1 UzUDT TahrirchiBERT
saved_models/spacy/transformer_combined/model-best/ S1.2 UzUDT+UT TahrirchiBERT

model-best = checkpoint with the highest combined dev score during training.

POS Taggers (saved_models/pos/)

File Experiment Data Embeddings Fusion
uz_uzudt_E1_tagger.pt E1 baseline UzUDT FastText β€”
uz_uzudt_E2.1_tagger.pt E2.1 UzUDT TahrirchiBERT last-subword
uz_uzudt_E3.1_tagger.pt E3.1 UzUDT TahrirchiBERT mean pooling
uz_uzudt_E5.1_tagger.pt E5.1 UzUDT TahrirchiBERT + charlm last-subword
uz_uzudt_E5.1.1_tagger.pt E5.1.1 UzUDT TahrirchiBERT + charlm (ablation) last-subword
uz_uzudt-base_tagger.pt Base UzUDT TahrirchiBERT last-subword
uz_combined_E1.2_tagger.pt E1.2 UzUDT+UT FastText β€”
uz_combined_E2.2_tagger.pt E2.2 UzUDT+UT TahrirchiBERT last-subword
uz_combined_E3.2_tagger.pt E3.2 UzUDT+UT TahrirchiBERT mean pooling

Dependency Parsers (saved_models/depparse/)

File Experiment Data Embeddings Fusion
uz_uzudt_E1.1_parser.pt E1.1 UzUDT FastText β€”
uz_uzudt_E2.1_parser.pt E2.1 UzUDT TahrirchiBERT last-subword
uz_uzudt_E3.1_parser.pt E3.1 UzUDT TahrirchiBERT mean pooling
uz_uzudt_E5.1_parser.pt E5.1 UzUDT TahrirchiBERT + charlm last-subword
uz_uzudt_nocharlm_parser.pt Ablation UzUDT TahrirchiBERT (no charlm) last-subword
uz_combined_E1.2_parser.pt E1.2 UzUDT+UT FastText β€”
uz_combined_E2.2_parser.pt E2.2 UzUDT+UT TahrirchiBERT last-subword
uz_combined_E3.2_parser.pt E3.2 UzUDT+UT TahrirchiBERT mean pooling

Evaluation Results (Test Set)

Exp Data Embeddings Fusion UPOS XPOS UFeats UAS LAS
E1.1 UzUDT FastText β€” 79.19 79.81 66.61 69.57 51.24
E1.2 UzUDT+UT FastText β€” 80.26 83.20 66.98 72.27 62.40
E2.1 UzUDT TahrirchiBERT last-sub 82.45 80.90 65.37 72.05 54.19
E2.2 UzUDT+UT TahrirchiBERT last-sub 85.08 84.72 71.09 72.39 63.81
E3.1 UzUDT TahrirchiBERT mean 82.76 81.37 65.22 69.10 51.55
E3.2 UzUDT+UT TahrirchiBERT mean 84.02 87.07 70.39 70.74 60.05

Best overall system: E2.2 β€” TahrirchiBERT + last-subword + merged data.

spaCy Pipeline Results (Test Set)

These models use spaCy's transformer pipeline with TahrirchiBERT and jointly predict UPOS, morphological features, and dependency structure.

Exp Data UPOS XPOS Morph Acc UAS LAS
S1.1 UzUDT 86.50 86.72 50.55 67.72 45.35
S1.2 UzUDT+UT 89.18 88.24 65.48 66.81 47.11

Results from spacy evaluate on the respective test sets.
Morph Acc = full morphological feature bundle accuracy.


How to Use

1. Clone this code repository

git clone https://github.com/Sanatbek/robust-parsing-uzbek.git
cd robust-parsing-uzbek

2. Set up environment

python -m venv .venv
# Windows:
.venv\Scripts\activate
# Linux/Mac:
source .venv/bin/activate

pip install -U pip
pip install -r requirements.txt
pip install -e stanza/

3. Download models from this HuggingFace repository

Install the HuggingFace Hub client if not already present:

pip install huggingface_hub

Download all models at once:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Sanatbek/uzudt",
    repo_type="model",
    local_dir=".",
    ignore_patterns=["*.md", ".gitattributes"]
)

Or download a specific model:

from huggingface_hub import hf_hub_download

hf_hub_download(
    repo_id="Sanatbek/uzudt",
    filename="saved_models/pos/uz_combined_E2.2_tagger.pt",
    local_dir="."
)

4. Run POS-only inference

python scripts/parse_test_pos_only.py \
  --tagger_model saved_models/pos/uz_combined_E2.2_tagger.pt \
  --input_file data/pos/uz_uzudt.test.in.conllu \
  --output_file output_pos.conllu

5. Run full pipeline (POS + dependency parsing)

FastText baseline (E1.2):

python scripts/parse_test_with_depparse.py \
  --tagger_model saved_models/pos/uz_combined_E1.2_tagger.pt \
  --parser_model saved_models/depparse/uz_combined_E1.2_parser.pt \
  --wordvec_pretrain_file wordvec/uz/pretrain/fasttext_cc_uz_300.pt \
  --input_file data/depparse/uz_uzudt.test.in.conllu \
  --output_file output_e1.conllu

Best BERT model (E2.2 β€” recommended):

python scripts/parse_test_with_depparse.py \
  --tagger_model saved_models/pos/uz_combined_E2.2_tagger.pt \
  --parser_model saved_models/depparse/uz_combined_E2.2_parser.pt \
  --bert_model tahrirchi/tahrirchi-bert-base \
  --input_file data/depparse/uz_uzudt.test.in.conllu \
  --output_file output_e2.conllu

6. Evaluate

# UD metrics (UAS, LAS, CLAS, MLAS, BLEX)
python scripts/eval.py \
  data/depparse/uz_uzudt.test.in.conllu \
  output_e2.conllu

# POS accuracy
python scripts/eval_pos.py \
  --gold data/pos/uz_uzudt.test.in.conllu \
  --system output_pos.conllu

spaCy Pipeline Usage

The spaCy models are ready-to-use directory-based pipelines β€” no custom code needed beyond installing spaCy and the Uzbek language module.

Install dependencies

pip install spacy spacy-transformers
pip install -e spacy_uzbek/   # custom Uzbek language class

For GPU (recommended for transformer):

pip install cupy-cuda12x==13.6.0

Download spaCy models from HuggingFace

from huggingface_hub import snapshot_download

# Download both spaCy models (preserves directory structure)
snapshot_download(
    repo_id="Sanatbek/uzudt",
    repo_type="model",
    local_dir=".",
    allow_patterns=["saved_models/spacy/**"]
)

Or download a single model:

from huggingface_hub import hf_hub_download

# S1.2 β€” best spaCy model (UzUDT+UT)
hf_hub_download(
    repo_id="Sanatbek/uzudt",
    filename="saved_models/spacy/transformer_combined/model-best/meta.json",
    local_dir="."
)
# Repeat for all files in the directory, or use snapshot_download with allow_patterns

Run inference

import spacy

# Load best spaCy model (S1.2 β€” trained on UzUDT+UT merged data)
nlp = spacy.load("saved_models/spacy/transformer_combined/model-best")

# Process Uzbek text
doc = nlp("Men kitob o'qiyapman.")

for token in doc:
    print(f"{token.text:20s}  POS={token.pos_:8s}  MORPH={str(token.morph):40s}  DEP={token.dep_:12s}  HEAD={token.head.text}")

Example output:

Men                   POS=PRON      MORPH=Case=Nom|Number=Sing|Person=1|PronType=Prs  DEP=nsubj       HEAD=o'qiyapman
kitob                 POS=NOUN      MORPH=POS=NOUN                                    DEP=obj         HEAD=o'qiyapman
o'qiyapman            POS=VERB      MORPH=Aspect=Prog|Mood=Ind|Number=Sing|Person=1    DEP=root        HEAD=o'qiyapman
.                     POS=PUNCT     MORPH=POS=PUNCT                                   DEP=punct       HEAD=o'qiyapman

Visualize dependency tree

from spacy import displacy

doc = nlp("Men kitob o'qiyapman.")
displacy.serve(doc, style="dep")   # opens browser at http://localhost:5000

Evaluate on test set

# Requires spacy_uzbek/data/uz_uzudt.test.spacy β€” convert first if needed:
python spacy_uzbek/convert_conllu.py \
  --input data/pos/uz_uzudt.test.in.conllu \
  --output spacy_uzbek/data/uz_uzudt.test.spacy

# Evaluate (GPU recommended)
python -m spacy evaluate \
  saved_models/spacy/transformer_combined/model-best \
  spacy_uzbek/data/uz_combined.test.spacy \
  --output results/spacy_s1.2_test.json --gpu-id 0

Dependencies

Package Version Purpose
Python >= 3.9 Runtime
PyTorch >= 2.0 Model inference
transformers >= 4.35 TahrirchiBERT loading
stanza local (editable) Stanza NLP pipeline
spacy >= 3.8 spaCy NLP pipeline
spacy-transformers >= 1.2 spaCy BERT integration
huggingface_hub >= 0.20 Model download

Citation

If you use these models, please cite:

@misc{matlatipov2026uzbek,
  title   = {Towards Robust Uzbek Neural Dependency Parsing},
  author  = {Matlatipov, Sanatbek},
  year    = {2026},
  url     = {https://huggingface.co/Sanatbek/uzudt}
}

License

CC BY-SA 4.0 β€” see LICENSE.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support