Model Card for aref-j/emotion-classifier-bert-fa-v1

This is a fine-tuned BERT model for classifying emotions in Persian text, specifically detecting 6 emotion categories: ANGRY, FEAR, HAPPY, HATE, SAD, SURPRISE. It was developed using a merged dataset of Persian emotion corpora and is designed for applications like sentiment analysis on Persian tweets.

Model Details

Model Description

This model is a fine-tuned version of ParsBERT (HooshvareLab/bert-base-parsbert-uncased) for emotion classification in Persian text. It uses a BERT base architecture with a sequence classification head to predict one of six emotion labels from input text. The model addresses class imbalance through weighted cross-entropy loss and was trained on a combined dataset of Persian tweets and short texts.

  • Developed by: Aref Jafary
  • Model type: Text classification (fine-tuned BERT)
  • Language(s) (NLP): Persian (fa)
  • License: MIT
  • Finetuned from model: HooshvareLab/bert-base-parsbert-uncased

Model Sources

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_name = "aref-j/emotion-classifier-bert-fa-v1"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Create the classification pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Example usage
result = classifier("چه هوای زیبایی امروز است")
print(result)  # e.g. [{'label': 'HAPPY', 'score': 0.99}]

Training Details

Training Data

The model was trained on a merged dataset from three Persian emotion corpora:

  • ArmanEmo: Over 7,000 Persian sentences labeled for 7 emotions., GitHub
  • EmoPars: 30,000 Persian tweets labeled with 6 basic emotions (Anger, Fear, Happiness, Sadness, Hatred, Wonder).GitHub
  • ShortPersianEmo: 5,472 short Persian texts labeled for 5 emotions (angry, sad, fear, happy, neutral). GitHub

Datasets were standardized, cleaned (normalization with Parsivar, removal of URLs, mentions, emojis, etc.), deduplicated, and split into 90% train / 10% validation, with ArmanEmo held out for testing.

Training Procedure

Preprocessing

Text was normalized using Parsivar, with character mapping, diacritic removal, and stripping of URLs, mentions, hashtags, emojis, punctuation, digits, and extra spaces. Multi-label instances in EmoPars were converted to single-label via dominant label.

Training Hyperparameters

  • Training regime: fp32 (assumed, not specified)
  • Batch size: 32
  • Epochs: 6
  • Learning rate: 1e-5
  • Optimizer: Not specified (default Hugging Face Trainer)
  • Loss: Weighted cross-entropy to handle class imbalance
  • Early stopping: After 2 epochs without validation loss improvement

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out ArmanEmo test set.

Factors

Evaluation disaggregated by emotion classes (ANGRY, FEAR, HAPPY, HATE, SAD, SURPRISE).

Metrics

Accuracy (overall correct predictions), Macro F1-score (average F1 across classes, treating all equally), Precision, Recall, and Confusion Matrix.

Results

  • Test Accuracy: 70.88%
  • Macro F1-Score: 66.35%

Detailed per-class metrics and confusion matrix available in the repository.

BibTeX:

@misc{jafary2023persianemotion,
  author = {Aref Jafary},
  title = {Persian Emotion Classification with BERT},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ArefJafary/Persian-Emotion-Classification-BERT}}
}

APA: Jafary, A. (2023). Persian Emotion Classification with BERT [Repository]. GitHub. https://github.com/ArefJafary/Persian-Emotion-Classification-BERT

Glossary

  • ParsBERT: A BERT model pre-trained on Persian text.
  • Weighted Cross-Entropy: Loss function that assigns higher weights to underrepresented classes.

Model Card Contact

Contact via GitHub: https://github.com/ArefJafary

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aref-j/emotion-classifier-bert-fa-v1

Finetuned
(18)
this model