Update README.md

9a012f1 verified 3 months ago

2.21 kB

license: mit
base_model: FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
language:
  - my
tags:
  - sentiment-analysis
  - myanmar
  - burmese
  - roberta
library_name: transformers
metrics:
  - accuracy
  - f1

MM Sentiment Intensity Model v1

This is a fine-tuned XLM-RoBERTa model for Myanmar sentiment analysis. It classifies text into 5 intensity levels, ranging from Very Negative to Very Positive.

Model Description

The model was trained on a custom Myanmar dataset specifically curated for sentiment detection. It utilizes a custom syllable breaking preprocessing step to handle Myanmar Unicode text effectively.

Developed by: [Thuta Sann]
Model Type: Text Classification
Language: Myanmar (Burmese)
Base Model: facebookai/xlm-roberta-base

Classification Labels

Label	Sentiment	Emoji
LABEL_0	Very Negative	🔴
LABEL_1	Negative	🟠
LABEL_2	Neutral	🟡
LABEL_3	Positive	🟢
LABEL_4	Very Positive	🔵

How to Use

To use this model, you must apply the Syllable Breaking logic before passing text to the model.

import re
from transformers import pipeline

def myanmar_sylbreak(line):
    pat = re.compile(r"((?<!္)[က-အ](?![်္])|[a-zA-Z0-9ဣဤဥဦဧဩဪဿ၌၍၏၀-၉၊။!-/:-@[-`{-~\s])")
    return pat.sub(r" \1", line).strip()

classifier = pipeline("text-classification", model="thutasann/mm_sentiment_model_v1")

raw_text = "ဒီနေ့ ရာသီဥတု အရမ်းသာယာတယ်"
segmented_text = myanmar_sylbreak(raw_text)
result = classifier(segmented_text)
print(result)

Training Data

This model was trained on the Myanmar Sentiment Intensity Dataset v1, which is based on research from the Myanmar NLP community.

Acknowledgments

Base model by FacebookAI (XLM-RoBERTa).
Syllable breaking logic based on sylbreak.
Dataset inspiration from chuuhtetnaing/myanmar-text-segmentation-dataset.