|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: chuuhtetnaing/myanmar-text-segmentation-model |
|
|
tags: |
|
|
- token-classification |
|
|
- myanmar |
|
|
- pos-tagging |
|
|
language: |
|
|
- my |
|
|
datasets: |
|
|
- chuuhtetnaing/myanmar-pos-dataset |
|
|
metrics: |
|
|
- f1 |
|
|
--- |
|
|
|
|
|
# Myanmar POS Tagging Model |
|
|
|
|
|
Fine-tuned [myanmar-text-segmentation-model](https://huggingface.co/chuuhtetnaing/myanmar-text-segmentation-model) for Myanmar Part-of-Speech tagging. |
|
|
|
|
|
## Training Results |
|
|
|
|
|
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy | |
|
|
|-------|---------------|-----------------|-----------|--------|------|----------| |
|
|
| 1 | 0.7611 | 0.5417 | 0.8000 | 0.8422 | 0.8205 | 0.8557 | |
|
|
| 2 | 0.3736 | 0.3170 | 0.8879 | 0.9040 | 0.8959 | 0.9123 | |
|
|
| 3 | 0.3015 | 0.2764 | 0.9000 | 0.9143 | 0.9071 | 0.9219 | |
|
|
| 4 | 0.2589 | 0.2562 | 0.9067 | 0.9189 | 0.9127 | 0.9265 | |
|
|
| 5 | 0.2504 | 0.2473 | 0.9104 | 0.9219 | 0.9161 | 0.9285 | |
|
|
| 6 | 0.2209 | 0.2403 | 0.9141 | 0.9237 | 0.9189 | 0.9311 | |
|
|
| 7 | 0.2253 | 0.2341 | 0.9168 | 0.9256 | 0.9212 | 0.9328 | |
|
|
| 8 | 0.2361 | 0.2319 | 0.9183 | 0.9264 | 0.9223 | 0.9337 | |
|
|
| 9 | 0.2140 | 0.2305 | 0.9180 | 0.9268 | 0.9224 | 0.9336 | |
|
|
| 10 | 0.2199 | 0.2311 | 0.9184 | 0.9265 | 0.9224 | 0.9337 | |
|
|
|
|
|
## Training Details |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Base Model | chuuhtetnaing/myanmar-text-segmentation-model | |
|
|
| Total Epochs | 10 | |
|
|
| Total Steps | 650 | |
|
|
| Best F1 | 0.9224 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Using Pipeline |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
pos = pipeline("token-classification", model="chuuhtetnaing/myanmar-pos-model", grouped_entities=True) |
|
|
segments = pos("မြန်မာနိုင်ငံသည်အရှေ့တောင်အာရှတွင်တည်ရှိသည်။") |
|
|
|
|
|
result = [] |
|
|
|
|
|
for segment in segments: |
|
|
pos_tag = segment['entity_group'] |
|
|
word = segment['word'] |
|
|
|
|
|
result.append(word + "/" + pos_tag) |
|
|
|
|
|
result = " ".join(result) |
|
|
|
|
|
print(result) |
|
|
|
|
|
# မြန်မာနိုင်ငံ/n သည်/ppm အရှေ့တောင်/n အာရှ/n တွင်/ppm တည်/v ရှိသည်။/part |
|
|
``` |
|
|
|
|
|
## Dataset |
|
|
|
|
|
Trained on [chuuhtetnaing/myanmar-pos-dataset](https://huggingface.co/datasets/chuuhtetnaing/myanmar-pos-dataset) dataset. |
|
|
|
|
|
## Labels |
|
|
|
|
|
| POS Tag | Description | Examples | |
|
|
|---------|-------------|----------| |
|
|
| abb | Abbreviation | အထက (Basic Education High School), လ.ဝ (Confidentiality) | |
|
|
| adj | Adjective | ရဲရင့် (brave), လှပ (beautiful), မွန်မြတ် (noble) | |
|
|
| adv | Adverb | ဖြေးဖြေး (slow), နည်းနည်း (less) | |
|
|
| conj | Conjunction | နှင့် (and), ထို့ကြောင့် (therefore), သို့မဟုတ် (or) | |
|
|
| fw | Foreign Word | 1, 2, 3, Myanmar, BBC, Google, ミャンマー, 缅甸 | |
|
|
| int | Interjection | အမလေး (Oh My God!) | |
|
|
| n | Noun | ကျောင်း (school), စာအုပ် (book), ဒေါ်အောင်ဆန်းစုကြည်, လွတ်လပ်ရေး (freedom) | |
|
|
| num | Number | ၁ (1), ၂ (2), ၃ (3), ၁၀ (10), ၁၀၀ (100) | |
|
|
| part | Particle | များ, ခဲ့, သင့်, လိမ့်, နိုင် | |
|
|
| ppm | Post-positional Marker | သည်, က, ကို, အား, သို့, မှာ, တွင် | |
|
|
| pron | Pronoun | ကျွန်တော် (I), ကျွန်မ (I), သင် (you), သူ (he), သူမ (she) | |
|
|
| punc | Punctuation | ။, ၊, (, ), \, _, ', " | |
|
|
| sb | Symbol | ?, #, &, %, $, £, ¥, 𝜆, π, ÷, +, ×, @ | |
|
|
| tn | Text Number | တစ် (one), နှစ် (two), သုံး (three), တစ်ရာ, တစ်ထောင် | |
|
|
| v | Verb | ကူညီ (help), လိုက်နာ (observe), အားပေး (encourage) | |
|
|
|