File size: 3,755 Bytes
bdcff6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
053fd64
 
bdcff6a
 
053fd64
 
b559e46
053fd64
b559e46
053fd64
b559e46
053fd64
b559e46
053fd64
b559e46
053fd64
bdcff6a
 
 
 
 
053fd64
 
bdcff6a
 
 
 
39cbe7f
 
 
bdcff6a
 
 
39cbe7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bdcff6a
053fd64
938d07a
 
 
 
053fd64
 
39cbe7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: apache-2.0
base_model: chuuhtetnaing/myanmar-text-segmentation-model
tags:
  - token-classification
  - myanmar
  - pos-tagging
language:
  - my
datasets:
  - chuuhtetnaing/myanmar-pos-dataset
metrics:
  - f1
---

# Myanmar POS Tagging Model

Fine-tuned [myanmar-text-segmentation-model](https://huggingface.co/chuuhtetnaing/myanmar-text-segmentation-model) for Myanmar Part-of-Speech tagging.

## Training Results

| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
|-------|---------------|-----------------|-----------|--------|------|----------|
| 1 | 0.7611 | 0.5417 | 0.8000 | 0.8422 | 0.8205 | 0.8557 |
| 2 | 0.3736 | 0.3170 | 0.8879 | 0.9040 | 0.8959 | 0.9123 |
| 3 | 0.3015 | 0.2764 | 0.9000 | 0.9143 | 0.9071 | 0.9219 |
| 4 | 0.2589 | 0.2562 | 0.9067 | 0.9189 | 0.9127 | 0.9265 |
| 5 | 0.2504 | 0.2473 | 0.9104 | 0.9219 | 0.9161 | 0.9285 |
| 6 | 0.2209 | 0.2403 | 0.9141 | 0.9237 | 0.9189 | 0.9311 |
| 7 | 0.2253 | 0.2341 | 0.9168 | 0.9256 | 0.9212 | 0.9328 |
| 8 | 0.2361 | 0.2319 | 0.9183 | 0.9264 | 0.9223 | 0.9337 |
| 9 | 0.2140 | 0.2305 | 0.9180 | 0.9268 | 0.9224 | 0.9336 |
| 10 | 0.2199 | 0.2311 | 0.9184 | 0.9265 | 0.9224 | 0.9337 |

## Training Details

| Parameter | Value |
|-----------|-------|
| Base Model | chuuhtetnaing/myanmar-text-segmentation-model |
| Total Epochs | 10 |
| Total Steps | 650 |
| Best F1 | 0.9224 |

## Usage

### Using Pipeline

```python
from transformers import pipeline

pos = pipeline("token-classification", model="chuuhtetnaing/myanmar-pos-model", grouped_entities=True)
segments = pos("မြန်မာနိုင်ငံသည်အရှေ့တောင်အာရှတွင်တည်ရှိသည်။")

result = []

for segment in segments:
    pos_tag = segment['entity_group']
    word = segment['word']

    result.append(word + "/" + pos_tag)

result = " ".join(result)

print(result)

# မြန်မာနိုင်ငံ/n သည်/ppm အရှေ့တောင်/n အာရှ/n တွင်/ppm တည်/v ရှိသည်။/part
```

## Dataset

Trained on [chuuhtetnaing/myanmar-pos-dataset](https://huggingface.co/datasets/chuuhtetnaing/myanmar-pos-dataset) dataset.

## Labels

| POS Tag | Description | Examples |
|---------|-------------|----------|
| abb | Abbreviation | အထက (Basic Education High School), လ.ဝ (Confidentiality) |
| adj | Adjective | ရဲရင့် (brave), လှပ (beautiful), မွန်မြတ် (noble) |
| adv | Adverb | ဖြေးဖြေး (slow), နည်းနည်း (less) |
| conj | Conjunction | နှင့် (and), ထို့ကြောင့် (therefore), သို့မဟုတ် (or) |
| fw | Foreign Word | 1, 2, 3, Myanmar, BBC, Google, ミャンマー, 缅甸 |
| int | Interjection | အမလေး (Oh My God!) |
| n | Noun | ကျောင်း (school), စာအုပ် (book), ဒေါ်အောင်ဆန်းစုကြည်, လွတ်လပ်ရေး (freedom) |
| num | Number | ၁ (1), ၂ (2), ၃ (3), ၁၀ (10), ၁၀၀ (100) |
| part | Particle | များ, ခဲ့, သင့်, လိမ့်, နိုင် |
| ppm | Post-positional Marker | သည်, က, ကို, အား, သို့, မှာ, တွင် |
| pron | Pronoun | ကျွန်တော် (I), ကျွန်မ (I), သင် (you), သူ (he), သူမ (she) |
| punc | Punctuation | ။, ၊, (, ), \, _, ', " |
| sb | Symbol | ?, #, &, %, $, £, ¥, 𝜆, π, ÷, +, ×, @ |
| tn | Text Number | တစ် (one), နှစ် (two), သုံး (three), တစ်ရာ, တစ်ထောင် |
| v | Verb | ကူညီ (help), လိုက်နာ (observe), အားပေး (encourage) |