|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- korean |
|
|
- emotion |
|
|
- emotion-classification |
|
|
- nlp |
|
|
- electra |
|
|
- koelectra |
|
|
- sentiment |
|
|
- sequence-classification |
|
|
license: mit |
|
|
datasets: |
|
|
- LimYeri/kor-diary-emotion_v2 |
|
|
- qowlsdud/CounselGPT |
|
|
language: |
|
|
- ko |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
base_model: |
|
|
- monologg/koelectra-base-v3-discriminator |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# HowRU-KoELECTRA-Emotion-Classifier |
|
|
|
|
|
## Model Description |
|
|
KoELECTRA ๊ธฐ๋ฐ์ ํ๊ตญ์ด(ํนํ ์ผ๊ธฐ/์ฌ๋ฆฌ ๊ธฐ๋ก) ๊ฐ์ ๋ถ๋ฅ ๋ชจ๋ธ์
๋๋ค.<br> |
|
|
ํ
์คํธ์์ 8๊ฐ์ง ๊ฐ์ (๊ธฐ์จ, ์ค๋ , ํ๋ฒํจ, ๋๋ผ์, ๋ถ์พํจ, ๋๋ ค์, ์ฌํ, ๋ถ๋
ธ)์ ์ธ์ํฉ๋๋ค. |
|
|
|
|
|
- **Model type:** Text Classification (Emotion Recognition) |
|
|
- **Language:** Korean (ํ๊ตญ์ด, ko) |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator) |
|
|
|
|
|
## Emotion Classes |
|
|
์ด ๋ชจ๋ธ์ ์
๋ ฅ๋ ํ๊ตญ์ด ๋ฌธ์ฅ์ ์ฃผ์ ๊ฐ์ ์ ์๋ 8๊ฐ ํด๋์ค ์ค ํ๋๋ก ๋ถ๋ฅํฉ๋๋ค. |
|
|
| Emotion (Korean) | Emotion (EN) | |
|
|
|------------------|--------------| |
|
|
| ๊ธฐ์จ | Joy | |
|
|
| ์ค๋ | Excitement | |
|
|
| ํ๋ฒํจ | Neutral | |
|
|
| ๋๋ผ์ | Surprise | |
|
|
| ๋ถ์พํจ | Disgust | |
|
|
| ๋๋ ค์ | Fear | |
|
|
| ์ฌํ | Sadness | |
|
|
| ๋ถ๋
ธ | Anger | |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started with the Model |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
import torch.nn.functional as F |
|
|
|
|
|
# 1) Load Model & Tokenizer |
|
|
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME) |
|
|
|
|
|
# GPU ์ฌ์ฉ ๊ฐ๋ฅ ์ ์๋ ์ ํ |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
model.to(device) |
|
|
model.eval() |
|
|
|
|
|
# ๊ฐ์ ๋ผ๋ฒจ ๋งคํ (id2label) |
|
|
id2label = model.config.id2label |
|
|
|
|
|
|
|
|
# 2) Inference Function |
|
|
def predict_emotion(text: str): |
|
|
""" |
|
|
Returns: |
|
|
- top1_pred: ์์ธก๋ ๊ฐ์ ๋ผ๋ฒจ |
|
|
- probs_sorted: ๊ฐ์ ๋ณ ํ๋ฅ (๋ด๋ฆผ์ฐจ์) |
|
|
- top2_pred: ์์ ๋ ๊ฐ์ ๊ฐ์ |
|
|
""" |
|
|
|
|
|
# ํ ํฌ๋์ด์ง |
|
|
inputs = tokenizer( |
|
|
text, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
padding=True, |
|
|
max_length=512 |
|
|
).to(device) |
|
|
|
|
|
# ์ถ๋ก |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
probs = F.softmax(logits, dim=-1)[0] |
|
|
|
|
|
# ์ ๋ ฌ๋ ํ๋ฅ |
|
|
probs_sorted = sorted( |
|
|
[(id2label[i], float(probs[i])) for i in range(len(probs))], |
|
|
key=lambda x: x[1], |
|
|
reverse=True |
|
|
) |
|
|
|
|
|
top1_pred = probs_sorted[0] |
|
|
top2_pred = probs_sorted[:2] |
|
|
|
|
|
return { |
|
|
"text": text, |
|
|
"top1_emotion": top1_pred, |
|
|
"top2_emotions": top2_pred, |
|
|
"all_probabilities": probs_sorted, |
|
|
} |
|
|
|
|
|
|
|
|
# 3) Example |
|
|
result = predict_emotion("์ค๋ ์ ๋ง ๊ธฐ๋ถ์ด ์ข๊ณ ํ๋ณตํ ํ๋ฃจ์์ด!") |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
### pipeline |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier" |
|
|
|
|
|
classifier = pipeline( |
|
|
"text-classification", |
|
|
model=MODEL_NAME, |
|
|
tokenizer=MODEL_NAME, |
|
|
top_k=None # ์ ์ฒด ๊ฐ์ ํ๋ฅ ๋ฐํ |
|
|
) |
|
|
|
|
|
# ์์ธก |
|
|
text = "์ค๋ ์ ๋ง ๊ธฐ๋ถ์ด ์ข๊ณ ํ๋ณตํ ํ๋ฃจ์์ด!" |
|
|
result = classifier(text) |
|
|
|
|
|
result = result[0] |
|
|
|
|
|
print("์
๋ ฅ ๋ฌธ์ฅ:", text) |
|
|
print("\nTop-1 ๊ฐ์ :", result[0]['label'], f"({result[0]['score']:.4f})") |
|
|
print("\n์ ์ฒด ๊ฐ์ ๋ถํฌ:") |
|
|
for r in result: |
|
|
print(f" {r['label']}: {r['score']:.4f}") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
1. [LimYeri/kor-diary-emotion_v2](https://huggingface.co/datasets/LimYeri/kor-diary-emotion_v2) |
|
|
2. [qowlsdud/CounselGPT](https://huggingface.co/datasets/qowlsdud/CounselGPT) |
|
|
|
|
|
- **Total(8:2๋ก ๋ถํ ):** 50,000ํ |
|
|
- **Train:** 40,000ํ |
|
|
- **Validation:** 10,000ํ |
|
|
|
|
|
### Training Procedure |
|
|
- **Base Model**: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator) |
|
|
- **Objective**: Single-label classification |
|
|
- **Max Length**: 512 |
|
|
|
|
|
### Training Hyperparameters |
|
|
- **num_train_epochs**: 3 |
|
|
- **learning_rate**: 3e-5 |
|
|
- **weight_decay**: 0.02 |
|
|
- **warmup_ratio**: 0.15 |
|
|
- **per_device_train_batch_size**: 32 |
|
|
- **per_device_eval_batch_size**: 64 |
|
|
- **max_grad_norm**: 1.0 |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance |
|
|
| Metric | Score | |
|
|
|-----------------|--------| |
|
|
| **Eval Accuracy** | 0.95 | |
|
|
| **Eval F1 Macro** | 0.95 | |
|
|
| **Eval Loss** | 0.16 | |
|
|
|
|
|
--- |
|
|
## Model Architecture |
|
|
|
|
|
### 1) ELECTRA Encoder (Base-size) |
|
|
- **Hidden size:** 768 |
|
|
- **Layers:** 12 Transformer blocks |
|
|
- **Attention heads:** 12 |
|
|
- **MLP intermediate size:** 3072 |
|
|
- **Activation:** GELU |
|
|
- **Dropout:** 0.1 |
|
|
|
|
|
### 2) Classification Head |
|
|
๊ฐ์ 8๊ฐ ํด๋์ค๋ฅผ ์์ธกํ๊ธฐ ์ํ ์ถ๊ฐ ๋ถ๋ฅ ํค๋: |
|
|
|
|
|
- **Dense Layer**: 768 โ 768 |
|
|
- **Activation**: GELU |
|
|
- **Dropout**: 0.1 |
|
|
- **Output Projection**: 768 โ 8 |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@misc{HowRUEmotion2025, |
|
|
title={HowRU KoELECTRA Emotion Classifier}, |
|
|
author={Lim, Yeri}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Classifier}} |
|
|
} |
|
|
``` |