--- license: mit language: - fa metrics: - accuracy - f1 base_model: - HooshvareLab/bert-base-parsbert-uncased pipeline_tag: text-classification library_name: transformers --- # Model Card for aref-j/emotion-classifier-bert-fa-v1 This is a fine-tuned BERT model for classifying emotions in Persian text, specifically detecting 6 emotion categories: ANGRY, FEAR, HAPPY, HATE, SAD, SURPRISE. It was developed using a merged dataset of Persian emotion corpora and is designed for applications like sentiment analysis on Persian tweets. ## Model Details ### Model Description This model is a fine-tuned version of ParsBERT (HooshvareLab/bert-base-parsbert-uncased) for emotion classification in Persian text. It uses a BERT base architecture with a sequence classification head to predict one of six emotion labels from input text. The model addresses class imbalance through weighted cross-entropy loss and was trained on a combined dataset of Persian tweets and short texts. - **Developed by:** Aref Jafary - **Model type:** Text classification (fine-tuned BERT) - **Language(s) (NLP):** Persian (fa) - **License:** MIT - **Finetuned from model:** HooshvareLab/bert-base-parsbert-uncased ### Model Sources - **Repository:** https://github.com/ArefJafary/Persian-Emotion-Classification-BERT ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline model_name = "aref-j/emotion-classifier-bert-fa-v1" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Create the classification pipeline classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) # Example usage result = classifier("چه هوای زیبایی امروز است") print(result) # e.g. [{'label': 'HAPPY', 'score': 0.99}] ``` ## Training Details ### Training Data The model was trained on a merged dataset from three Persian emotion corpora: - **ArmanEmo**: Over 7,000 Persian sentences labeled for 7 emotions., [GitHub](https://github.com/Arman-Rayan-Sharif/arman-text-emotion) - **EmoPars**: 30,000 Persian tweets labeled with 6 basic emotions (Anger, Fear, Happiness, Sadness, Hatred, Wonder).[GitHub](https://github.com/nazaninsbr/Persian-Emotion-Detection) - **ShortPersianEmo**: 5,472 short Persian texts labeled for 5 emotions (angry, sad, fear, happy, neutral). [GitHub](https://github.com/vkiani/ShortPersianEmo) Datasets were standardized, cleaned (normalization with Parsivar, removal of URLs, mentions, emojis, etc.), deduplicated, and split into 90% train / 10% validation, with ArmanEmo held out for testing. ### Training Procedure #### Preprocessing Text was normalized using Parsivar, with character mapping, diacritic removal, and stripping of URLs, mentions, hashtags, emojis, punctuation, digits, and extra spaces. Multi-label instances in EmoPars were converted to single-label via dominant label. #### Training Hyperparameters - **Training regime:** fp32 (assumed, not specified) - Batch size: 32 - Epochs: 6 - Learning rate: 1e-5 - Optimizer: Not specified (default Hugging Face Trainer) - Loss: Weighted cross-entropy to handle class imbalance - Early stopping: After 2 epochs without validation loss improvement ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data Held-out ArmanEmo test set. #### Factors Evaluation disaggregated by emotion classes (ANGRY, FEAR, HAPPY, HATE, SAD, SURPRISE). #### Metrics Accuracy (overall correct predictions), Macro F1-score (average F1 across classes, treating all equally), Precision, Recall, and Confusion Matrix. ### Results - Test Accuracy: 70.88% - Macro F1-Score: 66.35% Detailed per-class metrics and confusion matrix available in the repository. **BibTeX:** ``` @misc{jafary2023persianemotion, author = {Aref Jafary}, title = {Persian Emotion Classification with BERT}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/ArefJafary/Persian-Emotion-Classification-BERT}} } ``` **APA:** Jafary, A. (2023). Persian Emotion Classification with BERT [Repository]. GitHub. https://github.com/ArefJafary/Persian-Emotion-Classification-BERT ## Glossary - **ParsBERT**: A BERT model pre-trained on Persian text. - **Weighted Cross-Entropy**: Loss function that assigns higher weights to underrepresented classes. ## Model Card Contact Contact via GitHub: https://github.com/ArefJafary