---
license: apache-2.0
base_model: chuuhtetnaing/myanmar-pos-model
tags:
  - token-classification
  - myanmar
  - ner-tagging
language:
  - my
datasets:
  - chuuhtetnaing/myanmar-ner-dataset
metrics:
  - f1
---

# Myanmar ner Tagging Model

Fine-tuned [myanmar-pos-model](https://huggingface.co/chuuhtetnaing/myanmar-pos-model) for Myanmar NER tagging.

## Training Results

| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
|-------|---------------|-----------------|-----------|--------|------|----------|
| 1 | 1.5385 | 0.3730 | 0.5397 | 0.5068 | 0.5227 | 0.9175 |
| 2 | 0.2673 | 0.1809 | 0.7271 | 0.7958 | 0.7599 | 0.9481 |
| 3 | 0.1623 | 0.1295 | 0.7815 | 0.8408 | 0.8101 | 0.9637 |
| 4 | 0.1291 | 0.1015 | 0.7836 | 0.8602 | 0.8201 | 0.9710 |
| 5 | 0.0992 | 0.0965 | 0.8200 | 0.8943 | 0.8555 | 0.9719 |
| 6 | 0.0801 | 0.0879 | 0.8299 | 0.9019 | 0.8644 | 0.9738 |
| 7 | 0.0706 | 0.0819 | 0.8580 | 0.9137 | 0.8849 | 0.9765 |
| 8 | 0.0636 | 0.0768 | 0.8660 | 0.9148 | 0.8897 | 0.9780 |
| 9 | 0.0577 | 0.0757 | 0.8784 | 0.9202 | 0.8988 | 0.9784 |
| 10 | 0.0527 | 0.0760 | 0.8737 | 0.9125 | 0.8927 | 0.9791 |
| 11 | 0.0506 | 0.0785 | 0.8710 | 0.9236 | 0.8965 | 0.9775 |
| 12 | 0.0470 | 0.0754 | 0.8830 | 0.9225 | 0.9023 | 0.9794 |
| 13 | 0.0459 | 0.0754 | 0.8896 | 0.9231 | 0.9061 | 0.9802 |
| 14 | 0.0441 | 0.0813 | 0.8742 | 0.9274 | 0.9000 | 0.9779 |
| 15 | 0.0398 | 0.0763 | 0.8952 | 0.9247 | 0.9097 | 0.9812 |
| 16 | 0.0387 | 0.0841 | 0.8713 | 0.9252 | 0.8974 | 0.9779 |
| 17 | 0.0344 | 0.0805 | 0.8924 | 0.9258 | 0.9088 | 0.9805 |
| 18 | 0.0356 | 0.0790 | 0.8854 | 0.9279 | 0.9061 | 0.9802 |
| 19 | 0.0333 | 0.0801 | 0.8864 | 0.9249 | 0.9052 | 0.9806 |
| 20 | 0.0326 | 0.0788 | 0.8939 | 0.9254 | 0.9094 | 0.9817 |
| 21 | 0.0314 | 0.0801 | 0.8863 | 0.9263 | 0.9059 | 0.9808 |
| 22 | 0.0309 | 0.0815 | 0.8866 | 0.9267 | 0.9062 | 0.9806 |
| 23 | 0.0310 | 0.0825 | 0.8854 | 0.9281 | 0.9062 | 0.9804 |
| 24 | 0.0280 | 0.0828 | 0.8874 | 0.9272 | 0.9068 | 0.9807 |
| 25 | 0.0271 | 0.0826 | 0.8884 | 0.9276 | 0.9076 | 0.9809 |
| 26 | 0.0290 | 0.0828 | 0.8887 | 0.9272 | 0.9075 | 0.9807 |
| 27 | 0.0318 | 0.0835 | 0.8855 | 0.9256 | 0.9051 | 0.9803 |
| 28 | 0.0287 | 0.0837 | 0.8871 | 0.9267 | 0.9065 | 0.9805 |
| 29 | 0.0274 | 0.0837 | 0.8855 | 0.9272 | 0.9058 | 0.9804 |
| 30 | 0.0271 | 0.0832 | 0.8875 | 0.9267 | 0.9067 | 0.9806 |

## Test Set Evaluation

Evaluated on [myanmar-ner-dataset](https://huggingface.co/datasets/chuuhtetnaing/myanmar-ner-dataset) test split using seqeval metrics:

| Entity | Precision | Recall | F1-Score | Support |
|--------|-----------|--------|----------|---------|
| DATE | 0.80 | 0.86 | 0.83 | 251 |
| LOC | 0.93 | 0.96 | 0.95 | 2712 |
| NUM | 0.89 | 0.92 | 0.90 | 789 |
| ORG | 0.44 | 0.62 | 0.52 | 94 |
| PER | 0.84 | 0.88 | 0.86 | 533 |
| TIME | 0.62 | 0.70 | 0.66 | 57 |
| **micro avg** | **0.89** | **0.93** | **0.91** | 4436 |
| **macro avg** | 0.75 | 0.82 | 0.78 | 4436 |
| **weighted avg** | **0.89** | **0.93** | **0.91** | 4436 |

## Training Details

| Parameter | Value |
|-----------|-------|
| Base Model | chuuhtetnaing/myanmar-pos-model |
| Total Epochs | 30 |
| Total Steps | 510 |
| Best Checkpoint | checkpoint-255 |
| Best F1 | 0.9097 |

## Usage
```python
from transformers import pipeline

ner = pipeline("token-classification", model="chuuhtetnaing/myanmar-ner-model", grouped_entities=True)
result = ner("ကိုမောင်သည်ရန်ကုန်မြို့သို့သွားသည်။")  # Ko Maung went to Yangon city
print(result)
```

## Evaluation Code

```python
!pip install seqeval

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
from datasets import load_dataset
from tqdm import tqdm
from seqeval.metrics import classification_report

# Load model and tokenizer
model = AutoModelForTokenClassification.from_pretrained("chuuhtetnaing/myanmar-ner-model")
tokenizer = AutoTokenizer.from_pretrained("chuuhtetnaing/myanmar-ner-model")

def tokenize_and_align_labels(examples):
    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
    labels = []
    for i, label in enumerate(examples["ner_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            if word_idx is None:
                label_ids.append(-100)
            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])
            else:
                label_ids.append(-100)
            previous_word_idx = word_idx
        labels.append(label_ids)
    tokenized_inputs["labels"] = labels
    return tokenized_inputs

# Load and tokenize dataset
ner = pipeline("token-classification", model="chuuhtetnaing/myanmar-ner-model", aggregation_strategy=None)
ds = load_dataset("chuuhtetnaing/myanmar-ner-dataset")
tokenized_ds = ds.map(tokenize_and_align_labels, batched=True)
test_ds = tokenized_ds["test"]

# Get label mapping
label_list = model.config.id2label

y_true = []
y_pred = []

for example in tqdm(test_ds):
    tokens = tokenizer.convert_ids_to_tokens(example["input_ids"])
    true_labels = [label_list[l] if l != -100 else "O" for l in example["labels"]]
    
    text = tokenizer.decode(example["input_ids"], skip_special_tokens=True)
    preds = ner(text)
    
    pred_labels = ["O"] * len(true_labels)
    for pred in preds:
        idx = pred["index"]
        if idx < len(pred_labels):
            pred_labels[idx] = pred["entity"]
    
    y_true.append([label_list[l] for l in example["labels"] if l != -100])
    y_pred.append([p for p, l in zip(pred_labels, example["labels"]) if l != -100])

print(classification_report(y_true, y_pred))
```

## NER Labels

| Tag | Description |
|-----|-------------|
| B-DATE | Beginning of Date |
| I-DATE | Inside Date |
| B-LOC | Beginning of Location |
| I-LOC | Inside Location |
| B-NUM | Beginning of Number |
| I-NUM | Inside Number |
| B-ORG | Beginning of Organization |
| I-ORG | Inside Organization |
| B-PER | Beginning of Person |
| I-PER | Inside Person |
| B-TIME | Beginning of Time |
| I-TIME | Inside Time |
| O | Outside (Not an entity) |