File size: 1,847 Bytes
1ddcd4f
415a39b
 
 
 
 
 
 
 
 
 
 
 
1ddcd4f
 
415a39b
1ddcd4f
415a39b
1ddcd4f
415a39b
1ddcd4f
415a39b
 
1ddcd4f
415a39b
1ddcd4f
415a39b
 
 
1ddcd4f
415a39b
 
 
1ddcd4f
415a39b
 
 
1ddcd4f
415a39b
 
 
 
1ddcd4f
415a39b
 
 
1ddcd4f
415a39b
 
 
 
 
 
 
 
 
1ddcd4f
415a39b
 
 
 
1ddcd4f
415a39b
1ddcd4f
415a39b
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
language: uz
license: apache-2.0
tags:
- uzbek
- dependency-parsing
- universal-dependencies
- nlp
datasets:
- universal_dependencies
metrics:
- accuracy
- f1
---

# Uzbek Dependency Parser

This model predicts Universal Dependencies dependency relations for Uzbek text.

## Model details

The model was fine-tuned on a Universal Dependencies treebank containing approximately 600 annotated sentences.
It is based on the [XLM-RoBERTa base model](https://huggingface.co/xlm-roberta-base) and adapted for token classification.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Arofat/uzbek-dependency-parser")
model = AutoModelForTokenClassification.from_pretrained("Arofat/uzbek-dependency-parser")

# Prepare text
text = "Men O'zbekistonda yashayman."
tokens = text.split()

# Get predictions
inputs = tokenizer(tokens, is_split_into_words=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

# Process outputs
predictions = torch.argmax(outputs.logits, dim=2)
id2label = model.config.id2label

# Get dependency relations
dep_tags = []
word_ids = inputs.word_ids(batch_index=0)
prev_word_id = None
for idx, word_id in enumerate(word_ids):
    if word_id is None or word_id == prev_word_id:
        continue
    dep_tags.append(id2label[predictions[0, idx].item()])
    prev_word_id = word_id

# Print results
for token, tag in zip(tokens, dep_tags):
    print(f"{token}: {tag}")
```

## Limitations

This model was trained on a relatively small dataset and may not generalize well to all domains of Uzbek text.
Note that this model only predicts dependency relations (labels) and not the dependency tree structure (heads).
For a complete dependency parse, additional processing is needed.