KoELECTRA Emotion Classification - Korean Emotion Classification

ํ•œ๊ตญ์–ด ๊ฐ์ • ๋ถ„๋ฅ˜ ๋ชจ๋ธ (6๊ฐ€์ง€ ๊ฐ์ •)

Model Description

์ด ๋ชจ๋ธ์€ AI Hub์˜ ๊ณต๊ฐํ˜• ๋Œ€ํ™” ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ธํŠœ๋‹๋œ ํ•œ๊ตญ์–ด ๊ฐ์ • ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

  • Base Model: monologg/koelectra-base-v3-discriminator
  • Task: 6-class emotion classification
  • Language: Korean
  • Dataset: AI Hub ๊ณต๊ฐํ˜• ๋Œ€ํ™” (Empathetic Dialogue)

Labels

Label Emotion English
0 ๊ธฐ์จ Joy
1 ์Šฌํ”” Sadness
2 ๋ถ„๋…ธ Anger
3 ๋ถˆ์•ˆ Anxiety
4 ๋‹นํ™ฉ Embarrassment
5 ์ƒ์ฒ˜ Hurt

Training Data

  • Dataset: AI Hub ๊ณต๊ฐํ˜• ๋Œ€ํ™” (Empathetic Dialogue)
  • Train samples: 22,758
  • Validation samples: 1,591
  • Test samples: 1,591

Performance

Metric Score
Accuracy 98.24%
F1 Score (weighted) 98.24%

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "jeongyoonhuh/koelectra-emotion-6class"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "์˜ค๋Š˜ ์ •๋ง ๊ธฐ๋ถ„์ด ์ข‹์•„์š”!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()

emotions = ['๊ธฐ์จ', '์Šฌํ””', '๋ถ„๋…ธ', '๋ถˆ์•ˆ', '๋‹นํ™ฉ', '์ƒ์ฒ˜']
print(f"Predicted emotion: {emotions[prediction]}")

Training Hyperparameters

  • Learning rate: 2e-5
  • Batch size: 16
  • Epochs: 5
  • Weight decay: 0.01
  • Warmup steps: 500
  • Max sequence length: 256
Downloads last month
9
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support