LimYeri's picture
Update README.md
0725960 verified
---
library_name: transformers
tags:
- korean
- emotion
- emotion-classification
- nlp
- electra
- koelectra
- sentiment
- sequence-classification
license: mit
datasets:
- LimYeri/kor-diary-emotion_v2
- qowlsdud/CounselGPT
language:
- ko
metrics:
- accuracy
- f1
base_model:
- monologg/koelectra-base-v3-discriminator
pipeline_tag: text-classification
---
# HowRU-KoELECTRA-Emotion-Classifier
## Model Description
KoELECTRA ๊ธฐ๋ฐ˜์˜ ํ•œ๊ตญ์–ด(ํŠนํžˆ ์ผ๊ธฐ/์‹ฌ๋ฆฌ ๊ธฐ๋ก) ๊ฐ์ • ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.<br>
ํ…์ŠคํŠธ์—์„œ 8๊ฐ€์ง€ ๊ฐ์ •(๊ธฐ์จ, ์„ค๋ ˜, ํ‰๋ฒ”ํ•จ, ๋†€๋ผ์›€, ๋ถˆ์พŒํ•จ, ๋‘๋ ค์›€, ์Šฌํ””, ๋ถ„๋…ธ)์„ ์ธ์‹ํ•ฉ๋‹ˆ๋‹ค.
- **Model type:** Text Classification (Emotion Recognition)
- **Language:** Korean (ํ•œ๊ตญ์–ด, ko)
- **License:** MIT
- **Finetuned from model:** [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)
## Emotion Classes
์ด ๋ชจ๋ธ์€ ์ž…๋ ฅ๋œ ํ•œ๊ตญ์–ด ๋ฌธ์žฅ์˜ ์ฃผ์š” ๊ฐ์ •์„ ์•„๋ž˜ 8๊ฐœ ํด๋ž˜์Šค ์ค‘ ํ•˜๋‚˜๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
| Emotion (Korean) | Emotion (EN) |
|------------------|--------------|
| ๊ธฐ์จ | Joy |
| ์„ค๋ ˜ | Excitement |
| ํ‰๋ฒ”ํ•จ | Neutral |
| ๋†€๋ผ์›€ | Surprise |
| ๋ถˆ์พŒํ•จ | Disgust |
| ๋‘๋ ค์›€ | Fear |
| ์Šฌํ”” | Sadness |
| ๋ถ„๋…ธ | Anger |
---
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
# 1) Load Model & Tokenizer
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
# GPU ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์‹œ ์ž๋™ ์ „ํ™˜
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
# ๊ฐ์ • ๋ผ๋ฒจ ๋งคํ•‘ (id2label)
id2label = model.config.id2label
# 2) Inference Function
def predict_emotion(text: str):
"""
Returns:
- top1_pred: ์˜ˆ์ธก๋œ ๊ฐ์ • ๋ผ๋ฒจ
- probs_sorted: ๊ฐ์ •๋ณ„ ํ™•๋ฅ (๋‚ด๋ฆผ์ฐจ์ˆœ)
- top2_pred: ์ƒ์œ„ ๋‘ ๊ฐœ์˜ ๊ฐ์ •
"""
# ํ† ํฌ๋‚˜์ด์ง•
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
padding=True,
max_length=512
).to(device)
# ์ถ”๋ก 
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=-1)[0]
# ์ •๋ ฌ๋œ ํ™•๋ฅ 
probs_sorted = sorted(
[(id2label[i], float(probs[i])) for i in range(len(probs))],
key=lambda x: x[1],
reverse=True
)
top1_pred = probs_sorted[0]
top2_pred = probs_sorted[:2]
return {
"text": text,
"top1_emotion": top1_pred,
"top2_emotions": top2_pred,
"all_probabilities": probs_sorted,
}
# 3) Example
result = predict_emotion("์˜ค๋Š˜ ์ •๋ง ๊ธฐ๋ถ„์ด ์ข‹๊ณ  ํ–‰๋ณตํ•œ ํ•˜๋ฃจ์˜€์–ด!")
print(result)
```
### pipeline
```python
from transformers import pipeline
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier"
classifier = pipeline(
"text-classification",
model=MODEL_NAME,
tokenizer=MODEL_NAME,
top_k=None # ์ „์ฒด ๊ฐ์ • ํ™•๋ฅ  ๋ฐ˜ํ™˜
)
# ์˜ˆ์ธก
text = "์˜ค๋Š˜ ์ •๋ง ๊ธฐ๋ถ„์ด ์ข‹๊ณ  ํ–‰๋ณตํ•œ ํ•˜๋ฃจ์˜€์–ด!"
result = classifier(text)
result = result[0]
print("์ž…๋ ฅ ๋ฌธ์žฅ:", text)
print("\nTop-1 ๊ฐ์ •:", result[0]['label'], f"({result[0]['score']:.4f})")
print("\n์ „์ฒด ๊ฐ์ • ๋ถ„ํฌ:")
for r in result:
print(f" {r['label']}: {r['score']:.4f}")
```
---
## Training Details
### Training Data
1. [LimYeri/kor-diary-emotion_v2](https://huggingface.co/datasets/LimYeri/kor-diary-emotion_v2)
2. [qowlsdud/CounselGPT](https://huggingface.co/datasets/qowlsdud/CounselGPT)
- **Total(8:2๋กœ ๋ถ„ํ• ):** 50,000ํ–‰
- **Train:** 40,000ํ–‰
- **Validation:** 10,000ํ–‰
### Training Procedure
- **Base Model**: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)
- **Objective**: Single-label classification
- **Max Length**: 512
### Training Hyperparameters
- **num_train_epochs**: 3
- **learning_rate**: 3e-5
- **weight_decay**: 0.02
- **warmup_ratio**: 0.15
- **per_device_train_batch_size**: 32
- **per_device_eval_batch_size**: 64
- **max_grad_norm**: 1.0
---
## Performance
| Metric | Score |
|-----------------|--------|
| **Eval Accuracy** | 0.95 |
| **Eval F1 Macro** | 0.95 |
| **Eval Loss** | 0.16 |
---
## Model Architecture
### 1) ELECTRA Encoder (Base-size)
- **Hidden size:** 768
- **Layers:** 12 Transformer blocks
- **Attention heads:** 12
- **MLP intermediate size:** 3072
- **Activation:** GELU
- **Dropout:** 0.1
### 2) Classification Head
๊ฐ์ • 8๊ฐœ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ์ถ”๊ฐ€ ๋ถ„๋ฅ˜ ํ—ค๋“œ:
- **Dense Layer**: 768 โ†’ 768
- **Activation**: GELU
- **Dropout**: 0.1
- **Output Projection**: 768 โ†’ 8
---
## Citation
```bibtex
@misc{HowRUEmotion2025,
title={HowRU KoELECTRA Emotion Classifier},
author={Lim, Yeri},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Classifier}}
}
```