File size: 5,252 Bytes
0a63124
 
d227671
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a63124
 
2dd373b
0a63124
d6b3ba6
9167307
c55edaf
7bc9622
2dd373b
0725960
2dd373b
49d11e5
7bc9622
d6b3ba6
c55edaf
49d11e5
 
 
 
 
 
 
 
 
 
0a63124
7bc9622
0a63124
7bc9622
 
 
 
b67b029
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a63124
 
b67b029
 
 
 
 
 
 
 
0a63124
b67b029
 
 
 
 
 
 
 
 
 
7bc9622
 
 
0a63124
b67b029
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7bc9622
0a63124
1ffbc3e
 
 
 
031e069
1ffbc3e
 
 
031e069
 
1ffbc3e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7bc9622
0a63124
 
 
 
49d11e5
 
0a63124
2dd373b
 
 
0a63124
 
49d11e5
2dd373b
 
0a63124
d6b3ba6
2dd373b
 
 
 
 
 
 
0a63124
7bc9622
0a63124
7bc9622
 
 
 
 
 
0a63124
7bc9622
 
0a63124
7bc9622
2dd373b
 
 
 
 
 
0a63124
7bc9622
 
0a63124
2dd373b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d227671
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
---
library_name: transformers
tags:
- korean
- emotion
- emotion-classification
- nlp
- electra
- koelectra
- sentiment
- sequence-classification
license: mit
datasets:
- LimYeri/kor-diary-emotion_v2
- qowlsdud/CounselGPT
language:
- ko
metrics:
- accuracy
- f1
base_model:
- monologg/koelectra-base-v3-discriminator
pipeline_tag: text-classification
---

# HowRU-KoELECTRA-Emotion-Classifier

## Model Description
KoELECTRA ๊ธฐ๋ฐ˜์˜ ํ•œ๊ตญ์–ด(ํŠนํžˆ ์ผ๊ธฐ/์‹ฌ๋ฆฌ ๊ธฐ๋ก) ๊ฐ์ • ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.<br>
ํ…์ŠคํŠธ์—์„œ 8๊ฐ€์ง€ ๊ฐ์ •(๊ธฐ์จ, ์„ค๋ ˜, ํ‰๋ฒ”ํ•จ, ๋†€๋ผ์›€, ๋ถˆ์พŒํ•จ, ๋‘๋ ค์›€, ์Šฌํ””, ๋ถ„๋…ธ)์„ ์ธ์‹ํ•ฉ๋‹ˆ๋‹ค.

- **Model type:** Text Classification (Emotion Recognition)
- **Language:** Korean (ํ•œ๊ตญ์–ด, ko)
- **License:** MIT
- **Finetuned from model:** [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)

## Emotion Classes
์ด ๋ชจ๋ธ์€ ์ž…๋ ฅ๋œ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ์˜ ์ฃผ์š” ๊ฐ์ •์„ ์•„๋ž˜ 8๊ฐœ ํด๋ž˜์Šค ์ค‘ ํ•˜๋‚˜๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
| Emotion (Korean) | Emotion (EN) |
|------------------|--------------|
| ๊ธฐ์จ             | Joy          |
| ์„ค๋ ˜             | Excitement   |
| ํ‰๋ฒ”ํ•จ           | Neutral      |
| ๋†€๋ผ์›€           | Surprise     |
| ๋ถˆ์พŒํ•จ           | Disgust      |
| ๋‘๋ ค์›€           | Fear         |
| ์Šฌํ””             | Sadness      |
| ๋ถ„๋…ธ             | Anger        |

---

## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

# 1) Load Model & Tokenizer
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

# GPU ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์‹œ ์ž๋™ ์ „ํ™˜
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# ๊ฐ์ • ๋ผ๋ฒจ ๋งคํ•‘ (id2label)
id2label = model.config.id2label


# 2) Inference Function
def predict_emotion(text: str):
    """
    Returns:
        - top1_pred: ์˜ˆ์ธก๋œ ๊ฐ์ • ๋ผ๋ฒจ
        - probs_sorted: ๊ฐ์ •๋ณ„ ํ™•๋ฅ (๋‚ด๋ฆผ์ฐจ์ˆœ)
        - top2_pred: ์ƒ์œ„ ๋‘ ๊ฐœ์˜ ๊ฐ์ •
    """

    # ํ† ํฌ๋‚˜์ด์ง•
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=512
    ).to(device)

    # ์ถ”๋ก 
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=-1)[0]

    # ์ •๋ ฌ๋œ ํ™•๋ฅ 
    probs_sorted = sorted(
        [(id2label[i], float(probs[i])) for i in range(len(probs))],
        key=lambda x: x[1],
        reverse=True
    )

    top1_pred = probs_sorted[0]
    top2_pred = probs_sorted[:2]

    return {
        "text": text,
        "top1_emotion": top1_pred,
        "top2_emotions": top2_pred,
        "all_probabilities": probs_sorted,
    }


# 3) Example
result = predict_emotion("์˜ค๋Š˜ ์ •๋ง ๊ธฐ๋ถ„์ด ์ข‹๊ณ  ํ–‰๋ณตํ•œ ํ•˜๋ฃจ์˜€์–ด!")
print(result)
```

### pipeline
```python
from transformers import pipeline

MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier"

classifier = pipeline(
    "text-classification",
    model=MODEL_NAME,
    tokenizer=MODEL_NAME,
    top_k=None   # ์ „์ฒด ๊ฐ์ • ํ™•๋ฅ  ๋ฐ˜ํ™˜
)

# ์˜ˆ์ธก
text = "์˜ค๋Š˜ ์ •๋ง ๊ธฐ๋ถ„์ด ์ข‹๊ณ  ํ–‰๋ณตํ•œ ํ•˜๋ฃจ์˜€์–ด!"
result = classifier(text)

result = result[0]

print("์ž…๋ ฅ ๋ฌธ์žฅ:", text)
print("\nTop-1 ๊ฐ์ •:", result[0]['label'], f"({result[0]['score']:.4f})")
print("\n์ „์ฒด ๊ฐ์ • ๋ถ„ํฌ:")
for r in result:
    print(f"  {r['label']}: {r['score']:.4f}")
```

---

## Training Details

### Training Data
1. [LimYeri/kor-diary-emotion_v2](https://huggingface.co/datasets/LimYeri/kor-diary-emotion_v2)
2. [qowlsdud/CounselGPT](https://huggingface.co/datasets/qowlsdud/CounselGPT)

- **Total(8:2๋กœ ๋ถ„ํ• ):** 50,000ํ–‰
- **Train:** 40,000ํ–‰
- **Validation:** 10,000ํ–‰

### Training Procedure
- **Base Model**: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)
- **Objective**: Single-label classification
- **Max Length**: 512

### Training Hyperparameters
- **num_train_epochs**: 3
- **learning_rate**: 3e-5
- **weight_decay**: 0.02
- **warmup_ratio**: 0.15
- **per_device_train_batch_size**: 32
- **per_device_eval_batch_size**: 64
- **max_grad_norm**: 1.0

---

## Performance
| Metric          | Score  |
|-----------------|--------|
| **Eval Accuracy** | 0.95  |
| **Eval F1 Macro** | 0.95  |
| **Eval Loss**     | 0.16  |

---
## Model Architecture

### 1) ELECTRA Encoder (Base-size)
- **Hidden size:** 768
- **Layers:** 12 Transformer blocks
- **Attention heads:** 12
- **MLP intermediate size:** 3072
- **Activation:** GELU
- **Dropout:** 0.1

### 2) Classification Head
๊ฐ์ • 8๊ฐœ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ์ถ”๊ฐ€ ๋ถ„๋ฅ˜ ํ—ค๋“œ:

- **Dense Layer**: 768 โ†’ 768
- **Activation**: GELU
- **Dropout**: 0.1
- **Output Projection**: 768 โ†’ 8

---

## Citation
```bibtex
@misc{HowRUEmotion2025,
  title={HowRU KoELECTRA Emotion Classifier},
  author={Lim, Yeri},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Classifier}}
}
```