--- library_name: transformers tags: - emotion-classification - text-classification - roberta - goemotions - sentiment-analysis license: mit datasets: - google-research-datasets/go_emotions language: - en metrics: - accuracy - f1 base_model: roberta-base --- # RoBERTa Emotion Classifier (7-class) Fine-tuned RoBERTa model for emotion classification on 7 emotions: **happy, sad, angry, fear, disgust, surprise, neutral**. ## Model Details - **Developed by:** VanshajR - **Base Model:** `roberta-base` (125M parameters) - **Task:** Multi-class emotion classification - **Dataset:** GoEmotions (27 emotions mapped to 7) - **Training Samples:** ~58,000 - **Language:** English - **License:** MIT ## Performance Evaluated on GoEmotions test set: | Metric | Score | |--------|-------| | **Accuracy** | **57.77%** | | **Macro F1** | **0.4787** | | Precision | 0.5289 | | Recall | 0.4958 | ### Per-Class Performance | Emotion | Precision | Recall | F1-Score | Support | |---------|-----------|--------|----------|---------| | Happy | 0.62 | 0.67 | 0.64 | 2,362 | | Sad | 0.54 | 0.51 | 0.52 | 1,210 | | Angry | 0.58 | 0.43 | 0.49 | 1,145 | | Fear | 0.42 | 0.31 | 0.36 | 428 | | Disgust | 0.48 | 0.26 | 0.34 | 361 | | Surprise | 0.43 | 0.43 | 0.43 | 623 | | Neutral | 0.64 | 0.86 | 0.73 | 8,711 | ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("VanshajR/roberta-emotion-7class") model = AutoModelForSequenceClassification.from_pretrained("VanshajR/roberta-emotion-7class") # Classify emotion text = "I'm so excited about this project!" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(predictions, dim=-1).item() # Emotion labels emotions = ["happy", "sad", "angry", "fear", "disgust", "surprise", "neutral"] print(f"Predicted emotion: {emotions[predicted_class]}") print(f"Confidence: {predictions[0][predicted_class].item():.2%}") ``` ## Training Details ### Training Data - **Dataset:** GoEmotions (Google Research) - **Emotion Mapping:** 27 fine-grained emotions → 7 basic emotions - **Training Samples:** ~58,000 Reddit comments - **Preprocessing:** Truncation to 128 tokens, lowercase normalization ### Training Procedure - **Optimizer:** AdamW (lr=2e-5, weight_decay=0.01) - **Batch Size:** 16 (train), 32 (eval) - **Epochs:** 3 - **Max Length:** 128 tokens - **Training Regime:** fp32 ### Compute Infrastructure - **Hardware:** NVIDIA RTX 3070 (8GB VRAM) - **Training Time:** ~2 hours - **Framework:** PyTorch 2.1.0, Transformers 4.35.0 ## Limitations and Bias - **Language:** English only - **Domain:** Primarily trained on Reddit comments (may not generalize to formal text) - **Class Imbalance:** Better performance on frequent emotions (happy, neutral) vs rare emotions (fear, disgust) - **Subjective Task:** Human annotators often disagree on emotions (~25-30% disagreement rate) ## Intended Use ✅ **Recommended:** - Emotion detection in conversational text - Evaluating emotion-controlled text generation - Research on emotion understanding in dialogue - Sentiment analysis applications ❌ **Not Recommended:** - Clinical diagnosis or mental health assessment - High-stakes decision making - Non-English languages ## Citation ```bibtex @misc{vanshajr2024roberta, author = {Vanshaj R}, title = {RoBERTa Emotion Classifier for 7-Class Emotion Detection}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/VanshajR/roberta-emotion-7class} } ``` ## Related Work Part of the **Emotion-Controlled Response Generation** project: - 🔗 [GitHub Repository](https://github.com/VanshajR/emotion-controlled-generation) - 🔗 [GPT-2 Emotion-Conditioned Model](https://huggingface.co/VanshajR/gpt2-emotion-prefix) - 📄 [Full Project Report](https://github.com/VanshajR/emotion-controlled-generation/blob/main/PROJECT_REPORT.md)