File size: 7,913 Bytes
d490d1f
 
8ef7e6b
 
d490d1f
 
 
 
8ef7e6b
 
d490d1f
 
8ef7e6b
 
 
 
 
89efc34
8ef7e6b
 
 
 
 
 
89efc34
8ef7e6b
 
 
7c955fd
8ef7e6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c955fd
8ef7e6b
8d9592f
 
 
 
 
 
 
8ef7e6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74b4b40
8ef7e6b
74b4b40
8ef7e6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
license: apache-2.0
datasets:
- alex-shvets/EmoPillars
language:
- en
metrics:
- f1
- precision
- recall
pipeline_tag: text-classification
library_name: transformers
tags:
- multi-label-classification
- fine-grained
- emotion-classification
model-index:
- name: roberta-base-emopillars-contextless
  results:
  - task:
      type: text-classification
      name: Multi-label Fine-Grained Emotion Classification
    dataset:
      type: multi-class-classification
      name: EmoPillars
      split: test
    metrics:
    - type: accuracy
      value: 0.95
      name: Accuracy(Hamming)
    - type: recall
      value: 0.68
      name: Recall-macro
    - type: f1
      value: 0.70
      name: F1-macro
---


## 🏷️ Model Details
This model is finetuned and optimized for fine-grained multi-label emotion classification task from text.
The model employs a hybrid training objective that integrates similarity-based contrastive learning with a classification objective, instead of using the conventional binary cross-entropy (BCE) loss alone. 
This approach enables the model to capture both semantic alignment between text and emotion concepts and label-specific decision boundaries, resulting in improved performance on the EmoPillars dataset.

*This model is the Model II (Classifier-based) variant accounding in our paper, which has achieved the best performance. Please read your work for more details of the model architecture and training objectives used.*

- **Developed by:** Subinoy Bera and Arnab Karmakar
- **Model type:** Transformers | RoBERTa-base
- **Language (NLP):** English
- **License:** Apache-2.0
- **Repository:** [GitHub](https://github.com/Hidden-States-AI-Labs/EmoAxis)
- **Research Paper:** [Do We Need a Classifier? Dual Objectives Go Beyond Baselines in Fine-Grained Emotion Classification.](https://zenodo.org/records/18123882?token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6IjhjNmQwMTYzLWFiYzEtNDBiZi05NTFkLTI2Mzg1YzhiYThhZSIsImRhdGEiOnt9LCJyYW5kb20iOiI5MDE1MDM1MTYxMTg1MzEyMTY3ZmY2YzNmY2NlYTM4OSJ9.JgOX4GlmZ8ad-PtjytzioPUPSJSGYp8wochqpTgMO78SE1oBq9R6yUor2_36oOaSUO04OPP0MJqBiYK0JK0NHA)


## βœ… Intended Usage
The model is specifically intended for **fine-grained multi-label emotion classification from text** in both practical and research settings.
It can be used to detect emotions from short to medium-length textual content such as social media posts, user comments, online discussions, reviews, and conversational text, where identifying fine-grained emotion categories give better insights.

The model is suitable for **local and offline deployment** for tasks such as emotion-aware text analysis, affective computing research, and downstream NLP applications that benefit from fine-grained emotion signals.


## πŸ“Š Dataset Used
[**EmoPillars**](https://huggingface.co/datasets/alex-shvets/EmoPillars)(2025): A large-scale multi-label emotion classification dataset, consisting of 300K English synthetic comments, annotated with 27 emotion categories plus a neutral label. The dataset is diverse and representative of real-world emotional language, consisting of informal grammar, sarcasm, and ambiguous or context-dependent cues. In this work, we adopt the full 28-label GoEmotions taxonomy for training & used a preprocessed subset of 100K examples.


## πŸ“Œ Model Performance (on Test)
The model is evaluated using standard multi-label metrics, with a focus on Macro-F1, which is widely regarded as the most informative metric for such imbalanced, multi-label emotion classification tasks.

- Macro-F1 : 0.70<br>
- Micro-F1: 0.78<br>
- Precision: 0.78<br>
- Recall: 0.68<br>
- Accuracy (Hamming): 0.95

| Traning Objective | Macro-F1 |
|-------------------|----------|
| Binary Cross-Entropy (BCE) loss | 0.67 |
| Clipped Asymmetic Loss (CAL) | 0.69 |
| Our proposed Hybrid Objective | 0.70 |

πŸ† **Given the absence of existing competitive base model varients on this EmoPillars dataset, our model is currently <u>*state-of-the-art*</u> among open-source methods!** πŸ₯‡


## πŸš€ Get Started with the Model

```bash
import torch
from transformers import AutoTokenizer, AutoModel
from transformers import logging as transformers_logging
import warnings
warnings.filterwarnings("ignore")
transformers_logging.set_verbosity_error()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_id = "Hidden-States/roberta-base-emopillars-contextless"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
model.to(device).eval()

emotion_labels = [
        "admiration", "amusement", "anger", "annoyance", "approval", "caring", 
        "confusion", "curiosity", "desire", "disappointment", "disapproval", 
        "disgust", "embarrassment", "excitement", "fear", "gratitude", "grief", 
        "joy", "love", "nervousness", "optimism", "pride", "realization", 
        "relief", "remorse", "sadness", "surprise", "neutral"
]

def predict_emotions(text):
  inputs = tokenizer(text, truncation=True, max_length=128,
                  padding=True, return_attention_mask=True, return_tensors="pt"
                ).to(device)
  _, logits = model(**inputs)

  probs = torch.sigmoid(logits)
  preds = (probs >= 0.5).int()[0]

  predicted_emotions = [
    emotion_labels[i]
    for i, v in enumerate(preds)
    if v.item() == 1
  ]
  print(predicted_emotions)

text = "Honestly, same. I was miserable at my admin asst job."
predict_emotions(text)

#output: ['annoyance', 'disappointment', 'sadness']
```

## πŸ› οΈ Training Hyperparameters and Details

| Parameter | Value |
|-----------|-------|
| encoder lr-rate | 2.5e-5 |
| classifier lr-rate | 1.5e-4 |
| optimizer | AdamW |
| lr-scheduler | cosine with warmup |
| weight decay | 0.001 |
| warmup ratio | 0.1 |
| temperature | 0.05 |
| clipping constant | 0.05 |
| batch size | 64 |
| epochs | 8 |
| threshold | 0.5 (fixed) |

Check out our paper for complete training details and objectives used: [Visit ↗️](https://zenodo.org/records/18123882?token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6IjhjNmQwMTYzLWFiYzEtNDBiZi05NTFkLTI2Mzg1YzhiYThhZSIsImRhdGEiOnt9LCJyYW5kb20iOiI5MDE1MDM1MTYxMTg1MzEyMTY3ZmY2YzNmY2NlYTM4OSJ9.JgOX4GlmZ8ad-PtjytzioPUPSJSGYp8wochqpTgMO78SE1oBq9R6yUor2_36oOaSUO04OPP0MJqBiYK0JK0NHA)


## πŸ’» Compute Infrastructure
- **Inference**: Any modern x86 CPU with minimum 8 GB RAM. GPU is optional, not required for inference.

- **Training/ Fine-Tuning**: Must use GPU with at least 12 GB of VRAM. This model has been trained in Google Colab environment with single T4 GPU.

- **Libraries/ Modules**
  1. Transformers : 4.57.3
  2. Pytorch : 2.8.0+cu129
  3. Datasets : 4.4.1
  4. Scikit-learn : 1.8.0
  5. Numpy : 2.3.5


## ⚠️ Out-of-Scope Use

The model cannot be directly used for detecting emotions from multi-lingual or multi-modal data/text, and cannot predict emotions beyond the 28-label GoEmotions-taxonomy.
While the proposed approach demonstrates strong empirical performance on benchmark datasets, it is not designed, evaluated, or validated for deployment in high-stakes or safety-critical applications.
The model may reflect dataset-specific biases, annotation subjectivity, and cultural limitations inherent in emotion datasets. Predictions should therefore be interpreted as approximate signals rather than definitive emotional states.

Users are responsible for ensuring that any downstream application complies with relevant ethical guidelines, legal regulations, and domain-specific standards.
<br>

## πŸŽ—οΈ Community Support & Citation

**If you find this model useful, please consider liking this repository and also give a star to our GitHub repository.
Your support helps us improve and maintain this work!** ⭐

πŸ“ **If you use our work in academic or research settings, please cite our work accordingly.** πŸ™πŸ˜ƒ <br>
<br>

THANK YOU!! πŸ§‘πŸ€πŸ’š<br>
*- with regards*: Hidden States AI Labs