Sanatbek
/

uzudt

@@ -1,57 +1,60 @@
----
-language:
-- uz
-tags:
-- dependency-parsing
-- pos-tagging
-- stanza
-- uzbek
-- universal-dependencies
-license: mit
-datasets:
-- UD_Uzbek-UzUDT
-metrics:
-- uas
-- las
-- upos
-base_model: elmurod1202/bertbek-news-big-cased
----
-# UzUDT: Robust Uzbek Neural Dependency Parsing
-This repository contains the trained **Stanza-style neural models** for Uzbek morphosyntactic tagging and dependency parsing, as described in the paper *Towards Robust Uzbek Neural Dependency Parsing*.
-## Model Description
-The system is designed to handle the agglutinative morphology and resource scarcity of Uzbek. It utilizes a **Stanza-like pipeline** augmented with:
-1.  **BERTbek Contextual Embeddings**: Utilizing the `elmurod1202/bertbek-news-big-cased` model with subword-to-word "super-token" fusion.
-2.  **Morphology-Aware Preprocessing**: An improved Apertium-based normalization layer to reduce sparsity.
-## Performance (UzUDT Test Set)
-Evaluated on the 3-star **UzUDT treebank** (681 sentences).
-| Metric | Score (%) |
-| :--- | :--- |
-| **UPOS** | 86.10 |
-| **XPOS** | 83.96 |
-| **UAS** | 74.21 |
-| **LAS** | 66.90 |
-| **UFeats** | 70.06 |
-## Usage
-Since the models are stored in custom directories (`pos/` and `depparse/`), you must specify the paths when loading the pipeline:
-```python
-import stanza
-# configuration to point to the specific model files
-config = {
-    'pos_model_path': './pos/uz_uzudt-base_tagger.pt',
-    'depparse_model_path': './depparse/uz_uzudt_nocharlm_parser.pt',
-    'use_gpu': True
-}
-# Initialize the pipeline
-nlp = stanza.Pipeline(lang='uz', processors='tokenize,pos,lemma,depparse', **config)
-doc = nlp("Oʻzbekistonning poytaxti Toshkent shahridir.")
 doc.sentences[0].print_dependencies()

+---
+language:
+- uz
+tags:
+- dependency-parsing
+- pos-tagging
+- tokenization
+- stanza
+- uzbek
+- universal-dependencies
+license: mit
+datasets:
+- UD_Uzbek-UzUDT
+metrics:
+- uas
+- las
+- upos
+base_model: elmurod1202/bertbek-news-big-cased
+---
+# UzUDT: Robust Uzbek Neural Dependency Parsing
+This repository contains the trained **Stanza-style neural models** for Uzbek tokenization, morphosyntactic tagging, and dependency parsing, as described in the paper *Towards Robust Uzbek Neural Dependency Parsing*.
+## Model Description
+The system is designed to handle the agglutinative morphology and resource scarcity of Uzbek. It utilizes a **Stanza-like pipeline** augmented with:
+1.  **BERTbek Contextual Embeddings**: Utilizing the `elmurod1202/bertbek-news-big-cased` model with subword-to-word "super-token" fusion.
+2.  **Morphology-Aware Preprocessing**: An improved Apertium-based normalization layer to reduce sparsity.
+## Performance (UzUDT Test Set)
+Evaluated on the 3-star **UzUDT treebank** (681 sentences).
+| Metric | Score (%) |
+| :--- | :--- |
+| **UPOS** | 86.10 |
+| **XPOS** | 83.96 |
+| **UAS** | 74.21 |
+| **LAS** | 66.90 |
+| **UFeats** | 70.06 |
+## Usage
+To use these models, download the `.pt` files to your local directory. You must specify the path to each model component (Tokenizer, POS, DepParse) in the configuration.
+```python
+import stanza
+# Configuration pointing to the local .pt files
+config = {
+    'tokenize_model_path': './uz_uzudt_tokenizer.pt',
+    'pos_model_path': './uz_uzudt-base_tagger.pt',
+    'depparse_model_path': './uz_uzudt_nocharlm_parser.pt',
+    'use_gpu': True
+}
+# Initialize the pipeline
+# Note: 'lemma' is excluded as it requires a separate model or external Apertium integration
+nlp = stanza.Pipeline(lang='uz', processors='tokenize,pos,depparse', **config)
+doc = nlp("Oʻzbekistonning poytaxti Toshkent shahridir.")
 doc.sentences[0].print_dependencies()