Model Card for aref-j/emotion-classifier-bert-fa-v1
This is a fine-tuned BERT model for classifying emotions in Persian text, specifically detecting 6 emotion categories: ANGRY, FEAR, HAPPY, HATE, SAD, SURPRISE. It was developed using a merged dataset of Persian emotion corpora and is designed for applications like sentiment analysis on Persian tweets.
Model Details
Model Description
This model is a fine-tuned version of ParsBERT (HooshvareLab/bert-base-parsbert-uncased) for emotion classification in Persian text. It uses a BERT base architecture with a sequence classification head to predict one of six emotion labels from input text. The model addresses class imbalance through weighted cross-entropy loss and was trained on a combined dataset of Persian tweets and short texts.
- Developed by: Aref Jafary
- Model type: Text classification (fine-tuned BERT)
- Language(s) (NLP): Persian (fa)
- License: MIT
- Finetuned from model: HooshvareLab/bert-base-parsbert-uncased
Model Sources
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "aref-j/emotion-classifier-bert-fa-v1"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Create the classification pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
# Example usage
result = classifier("چه هوای زیبایی امروز است")
print(result) # e.g. [{'label': 'HAPPY', 'score': 0.99}]
Training Details
Training Data
The model was trained on a merged dataset from three Persian emotion corpora:
- ArmanEmo: Over 7,000 Persian sentences labeled for 7 emotions., GitHub
- EmoPars: 30,000 Persian tweets labeled with 6 basic emotions (Anger, Fear, Happiness, Sadness, Hatred, Wonder).GitHub
- ShortPersianEmo: 5,472 short Persian texts labeled for 5 emotions (angry, sad, fear, happy, neutral). GitHub
Datasets were standardized, cleaned (normalization with Parsivar, removal of URLs, mentions, emojis, etc.), deduplicated, and split into 90% train / 10% validation, with ArmanEmo held out for testing.
Training Procedure
Preprocessing
Text was normalized using Parsivar, with character mapping, diacritic removal, and stripping of URLs, mentions, hashtags, emojis, punctuation, digits, and extra spaces. Multi-label instances in EmoPars were converted to single-label via dominant label.
Training Hyperparameters
- Training regime: fp32 (assumed, not specified)
- Batch size: 32
- Epochs: 6
- Learning rate: 1e-5
- Optimizer: Not specified (default Hugging Face Trainer)
- Loss: Weighted cross-entropy to handle class imbalance
- Early stopping: After 2 epochs without validation loss improvement
Evaluation
Testing Data, Factors & Metrics
Testing Data
Held-out ArmanEmo test set.
Factors
Evaluation disaggregated by emotion classes (ANGRY, FEAR, HAPPY, HATE, SAD, SURPRISE).
Metrics
Accuracy (overall correct predictions), Macro F1-score (average F1 across classes, treating all equally), Precision, Recall, and Confusion Matrix.
Results
- Test Accuracy: 70.88%
- Macro F1-Score: 66.35%
Detailed per-class metrics and confusion matrix available in the repository.
BibTeX:
@misc{jafary2023persianemotion,
author = {Aref Jafary},
title = {Persian Emotion Classification with BERT},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ArefJafary/Persian-Emotion-Classification-BERT}}
}
APA: Jafary, A. (2023). Persian Emotion Classification with BERT [Repository]. GitHub. https://github.com/ArefJafary/Persian-Emotion-Classification-BERT
Glossary
- ParsBERT: A BERT model pre-trained on Persian text.
- Weighted Cross-Entropy: Loss function that assigns higher weights to underrepresented classes.
Model Card Contact
Contact via GitHub: https://github.com/ArefJafary
- Downloads last month
- 12
Model tree for aref-j/emotion-classifier-bert-fa-v1
Base model
HooshvareLab/bert-base-parsbert-uncased