File size: 3,696 Bytes
c0ce64f
 
d30e7ce
 
 
 
 
 
 
 
 
 
c0ce64f
 
 
 
d30e7ce
c0ce64f
d30e7ce
c0ce64f
d30e7ce
c0ce64f
 
47c3ab8
b8d1e7c
 
 
4bbc73c
c0ce64f
d30e7ce
c0ce64f
4bbc73c
 
80d9494
 
 
 
c0ce64f
 
 
d30e7ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
456d8c1
d30e7ce
 
 
 
 
456d8c1
 
 
 
 
 
d30e7ce
 
 
 
ca8da7c
d30e7ce
 
 
 
456d8c1
d30e7ce
 
456d8c1
d30e7ce
 
 
f7e074c
d30e7ce
c0ce64f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d30e7ce
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
library_name: transformers
tags:
- emotion
- classification
- roberta
- multi-label
- sentiment-analysis
license: mit
language:
- en
pipeline_tag: text-classification
---

### Model Description

This is a finetuned roberta-base model aimed at identifying the strength of emotions for an input comment. 

### Downstream Use

Embeddings for comments can be extracted for downstream analyses

## Bias, Risks, and Limitations
Risks: If you are truly unsure of a paragraph/comment's sentiment, seek the advice of humans. This model shows some bias toward more widely represented training classes 

Caring is a somewhat confusing category. During training, comments that were annotated as "caring" if they included sympathetic content or indignace on behalf of others. This emotional category will need to be further separated into different categories such as "indignance" and "caring"

Sarcasm is treated as the combination of "amusement" and "disapproval" amusement can apply to irony and humorous tone, but largely applies to sarcasm... adding a specific class for sarcasm is a much needed improvement that will be pursued later down the line

not many risks... just MANY limitations. The training dataset was initially imbalanced, this was remedied with data augmentation and a weighted loss function... nontheless it struggles with sarcasm and sometimes unpredictable predictions because of dominating classes.

Ultimately, I hope some struggling grad or undergrad student can find this model useful for an arbitrary project they desire to prusue

## My use for the project can be found at the below github link

https://github.com/AnnaMarieHo/sentiment-analysis/tree/main

## How to Get Started with the Model


```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import numpy as np

def predict_emotions(text, model_name, threshold=0.35):
    # Load model and tokenizer
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    # Tokenize and predict
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=250)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probabilities = torch.sigmoid(logits).numpy()[0]
    
    # Map probabilities to emotions
    emotions = {emotion: float(prob) for emotion, prob in zip(model.config.id2label.values(), probabilities)}
    
    # Get emotions above threshold and sort by probability
    predicted_emotions = [(emotion, prob) for emotion, prob in emotions.items() if prob >= threshold]
    predicted_emotions.sort(key=lambda x: x[1], reverse=True)
    
    return {
        "text": text,
        "predicted_emotions": predicted_emotions,
        "all_probabilities": dict(sorted(emotions.items(), key=lambda x: x[1], reverse=True)),
        "threshold_used": threshold
    }

# Example usage
result = predict_emotions(
    "I'm feeling really excited and happy about this news!", 
    "model-name",
    threshold=0.35  # Customize threshold here
)

# Print results
print(f"Text: {result['text']}")
print("\nDetected emotions (sorted by probability):")
for emotion, prob in result['predicted_emotions']:
    print(f"  - {emotion.upper()} ({prob:.4f})")

print("\nAll emotion probabilities (sorted):")
for emotion, prob in result['all_probabilities'].items():
    print(f"  {'*' if prob >= result['threshold_used'] else ' '} {emotion}: {prob:.4f}")
```

#### Training Hyperparameters

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

#### Metrics

### Results

#### Summary


### Model Architecture and Objective