KoELECTRA Emotion Classification - Korean Emotion Classification
ํ๊ตญ์ด ๊ฐ์ ๋ถ๋ฅ ๋ชจ๋ธ (6๊ฐ์ง ๊ฐ์ )
Model Description
์ด ๋ชจ๋ธ์ AI Hub์ ๊ณต๊ฐํ ๋ํ ๋ฐ์ดํฐ์ ์ ์ฌ์ฉํ์ฌ ํ์ธํ๋๋ ํ๊ตญ์ด ๊ฐ์ ๋ถ๋ฅ ๋ชจ๋ธ์ ๋๋ค.
- Base Model: monologg/koelectra-base-v3-discriminator
- Task: 6-class emotion classification
- Language: Korean
- Dataset: AI Hub ๊ณต๊ฐํ ๋ํ (Empathetic Dialogue)
Labels
| Label | Emotion | English |
|---|---|---|
| 0 | ๊ธฐ์จ | Joy |
| 1 | ์ฌํ | Sadness |
| 2 | ๋ถ๋ ธ | Anger |
| 3 | ๋ถ์ | Anxiety |
| 4 | ๋นํฉ | Embarrassment |
| 5 | ์์ฒ | Hurt |
Training Data
- Dataset: AI Hub ๊ณต๊ฐํ ๋ํ (Empathetic Dialogue)
- Train samples: 22,758
- Validation samples: 1,591
- Test samples: 1,591
Performance
| Metric | Score |
|---|---|
| Accuracy | 98.24% |
| F1 Score (weighted) | 98.24% |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "jeongyoonhuh/koelectra-emotion-6class"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "์ค๋ ์ ๋ง ๊ธฐ๋ถ์ด ์ข์์!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
emotions = ['๊ธฐ์จ', '์ฌํ', '๋ถ๋
ธ', '๋ถ์', '๋นํฉ', '์์ฒ']
print(f"Predicted emotion: {emotions[prediction]}")
Training Hyperparameters
- Learning rate: 2e-5
- Batch size: 16
- Epochs: 5
- Weight decay: 0.01
- Warmup steps: 500
- Max sequence length: 256
- Downloads last month
- 9