metadata
license: apache-2.0
base_model: chuuhtetnaing/myanmar-pos-model
tags:
- token-classification
- myanmar
- ner-tagging
language:
- my
datasets:
- chuuhtetnaing/myanmar-ner-dataset
metrics:
- f1
Myanmar ner Tagging Model
Fine-tuned myanmar-pos-model for Myanmar NER tagging.
Training Results
| Epoch |
Training Loss |
Validation Loss |
Precision |
Recall |
F1 |
Accuracy |
| 1 |
1.5385 |
0.3730 |
0.5397 |
0.5068 |
0.5227 |
0.9175 |
| 2 |
0.2673 |
0.1809 |
0.7271 |
0.7958 |
0.7599 |
0.9481 |
| 3 |
0.1623 |
0.1295 |
0.7815 |
0.8408 |
0.8101 |
0.9637 |
| 4 |
0.1291 |
0.1015 |
0.7836 |
0.8602 |
0.8201 |
0.9710 |
| 5 |
0.0992 |
0.0965 |
0.8200 |
0.8943 |
0.8555 |
0.9719 |
| 6 |
0.0801 |
0.0879 |
0.8299 |
0.9019 |
0.8644 |
0.9738 |
| 7 |
0.0706 |
0.0819 |
0.8580 |
0.9137 |
0.8849 |
0.9765 |
| 8 |
0.0636 |
0.0768 |
0.8660 |
0.9148 |
0.8897 |
0.9780 |
| 9 |
0.0577 |
0.0757 |
0.8784 |
0.9202 |
0.8988 |
0.9784 |
| 10 |
0.0527 |
0.0760 |
0.8737 |
0.9125 |
0.8927 |
0.9791 |
| 11 |
0.0506 |
0.0785 |
0.8710 |
0.9236 |
0.8965 |
0.9775 |
| 12 |
0.0470 |
0.0754 |
0.8830 |
0.9225 |
0.9023 |
0.9794 |
| 13 |
0.0459 |
0.0754 |
0.8896 |
0.9231 |
0.9061 |
0.9802 |
| 14 |
0.0441 |
0.0813 |
0.8742 |
0.9274 |
0.9000 |
0.9779 |
| 15 |
0.0398 |
0.0763 |
0.8952 |
0.9247 |
0.9097 |
0.9812 |
| 16 |
0.0387 |
0.0841 |
0.8713 |
0.9252 |
0.8974 |
0.9779 |
| 17 |
0.0344 |
0.0805 |
0.8924 |
0.9258 |
0.9088 |
0.9805 |
| 18 |
0.0356 |
0.0790 |
0.8854 |
0.9279 |
0.9061 |
0.9802 |
| 19 |
0.0333 |
0.0801 |
0.8864 |
0.9249 |
0.9052 |
0.9806 |
| 20 |
0.0326 |
0.0788 |
0.8939 |
0.9254 |
0.9094 |
0.9817 |
| 21 |
0.0314 |
0.0801 |
0.8863 |
0.9263 |
0.9059 |
0.9808 |
| 22 |
0.0309 |
0.0815 |
0.8866 |
0.9267 |
0.9062 |
0.9806 |
| 23 |
0.0310 |
0.0825 |
0.8854 |
0.9281 |
0.9062 |
0.9804 |
| 24 |
0.0280 |
0.0828 |
0.8874 |
0.9272 |
0.9068 |
0.9807 |
| 25 |
0.0271 |
0.0826 |
0.8884 |
0.9276 |
0.9076 |
0.9809 |
| 26 |
0.0290 |
0.0828 |
0.8887 |
0.9272 |
0.9075 |
0.9807 |
| 27 |
0.0318 |
0.0835 |
0.8855 |
0.9256 |
0.9051 |
0.9803 |
| 28 |
0.0287 |
0.0837 |
0.8871 |
0.9267 |
0.9065 |
0.9805 |
| 29 |
0.0274 |
0.0837 |
0.8855 |
0.9272 |
0.9058 |
0.9804 |
| 30 |
0.0271 |
0.0832 |
0.8875 |
0.9267 |
0.9067 |
0.9806 |
Test Set Evaluation
Evaluated on myanmar-ner-dataset test split using seqeval metrics:
| Entity |
Precision |
Recall |
F1-Score |
Support |
| DATE |
0.80 |
0.86 |
0.83 |
251 |
| LOC |
0.93 |
0.96 |
0.95 |
2712 |
| NUM |
0.89 |
0.92 |
0.90 |
789 |
| ORG |
0.44 |
0.62 |
0.52 |
94 |
| PER |
0.84 |
0.88 |
0.86 |
533 |
| TIME |
0.62 |
0.70 |
0.66 |
57 |
| micro avg |
0.89 |
0.93 |
0.91 |
4436 |
| macro avg |
0.75 |
0.82 |
0.78 |
4436 |
| weighted avg |
0.89 |
0.93 |
0.91 |
4436 |
Training Details
| Parameter |
Value |
| Base Model |
chuuhtetnaing/myanmar-pos-model |
| Total Epochs |
30 |
| Total Steps |
510 |
| Best Checkpoint |
checkpoint-255 |
| Best F1 |
0.9097 |
Usage
from transformers import pipeline
ner = pipeline("token-classification", model="chuuhtetnaing/myanmar-ner-model", grouped_entities=True)
result = ner("ကိုမောင်သည်ရန်ကုန်မြို့သို့သွားသည်။")
print(result)
Evaluation Code
!pip install seqeval
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
from datasets import load_dataset
from tqdm import tqdm
from seqeval.metrics import classification_report
model = AutoModelForTokenClassification.from_pretrained("chuuhtetnaing/myanmar-ner-model")
tokenizer = AutoTokenizer.from_pretrained("chuuhtetnaing/myanmar-ner-model")
def tokenize_and_align_labels(examples):
tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
labels = []
for i, label in enumerate(examples["ner_tags"]):
word_ids = tokenized_inputs.word_ids(batch_index=i)
previous_word_idx = None
label_ids = []
for word_idx in word_ids:
if word_idx is None:
label_ids.append(-100)
elif word_idx != previous_word_idx:
label_ids.append(label[word_idx])
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids)
tokenized_inputs["labels"] = labels
return tokenized_inputs
ner = pipeline("token-classification", model="chuuhtetnaing/myanmar-ner-model", aggregation_strategy=None)
ds = load_dataset("chuuhtetnaing/myanmar-ner-dataset")
tokenized_ds = ds.map(tokenize_and_align_labels, batched=True)
test_ds = tokenized_ds["test"]
label_list = model.config.id2label
y_true = []
y_pred = []
for example in tqdm(test_ds):
tokens = tokenizer.convert_ids_to_tokens(example["input_ids"])
true_labels = [label_list[l] if l != -100 else "O" for l in example["labels"]]
text = tokenizer.decode(example["input_ids"], skip_special_tokens=True)
preds = ner(text)
pred_labels = ["O"] * len(true_labels)
for pred in preds:
idx = pred["index"]
if idx < len(pred_labels):
pred_labels[idx] = pred["entity"]
y_true.append([label_list[l] for l in example["labels"] if l != -100])
y_pred.append([p for p, l in zip(pred_labels, example["labels"]) if l != -100])
print(classification_report(y_true, y_pred))
NER Labels
| Tag |
Description |
| B-DATE |
Beginning of Date |
| I-DATE |
Inside Date |
| B-LOC |
Beginning of Location |
| I-LOC |
Inside Location |
| B-NUM |
Beginning of Number |
| I-NUM |
Inside Number |
| B-ORG |
Beginning of Organization |
| I-ORG |
Inside Organization |
| B-PER |
Beginning of Person |
| I-PER |
Inside Person |
| B-TIME |
Beginning of Time |
| I-TIME |
Inside Time |
| O |
Outside (Not an entity) |