metadata
library_name: transformers
tags:
- text-regression
- emotion-regression
- sentiment-regression
- ko
- korean
- koelectra
- emotion-analysis
- nlp
license: mit
language:
- ko
base_model:
- monologg/koelectra-base-v3-discriminator
HowRU-KoELECTRA-Emotion-Regression
Model Description
KoELECTRA ๊ธฐ๋ฐ์ ํ๊ตญ์ด(ํนํ ์ผ๊ธฐ/์ฌ๋ฆฌ ๊ธฐ๋ก) ๊ฐ์ ์ค์ฝ์ด๋ง(Regression) ๋ชจ๋ธ์
๋๋ค.
ํ
์คํธ ์ ๊ฐ์ ์ ๊ฐ๋์ ๋ฐฉํฅ์ฑ(๊ธ์ โ ๋ถ์ ) ์ โ1.0 ~ 1.0 ์ค์๊ฐ์ผ๋ก ์์ธกํฉ๋๋ค.
- Model type: Regression (Emotion Intensity / Sentiment Strength)
- Output Range: -1.0 ~ 1.0
- Language: Korean (ํ๊ตญ์ด, ko)
- License: MIT
- Finetuned from model: monologg/koelectra-base-v3-discriminator
Emotion Score Interpretation
๋ชจ๋ธ์ ์ ๋ ฅ๋ ํ๊ตญ์ด ๋ฌธ์ฅ์ ๊ฐ์ ๊ฐ๋๋ฅผ ์๋ ๋ฒ์ ์ค ํ๋๋ก ์ฐ์ถํฉ๋๋ค.
| Score Range | Meaning |
|---|---|
| +0.6 ~ +1.0 | ๊ฐํ ๊ธ์ ๊ฐ์ |
| +0.2 ~ +0.6 | ์ฝํ ๊ธ์ ๊ฐ์ |
| -0.2 ~ +0.2 | ์ค๋ฆฝ ๋๋ ๊ฐ์ ํํ์ด ๋ฏธ์ฝ |
| -0.6 ~ -0.2 | ์ฝํ ๋ถ์ ๊ฐ์ |
| -1.0 ~ -0.6 | ๊ฐํ ๋ถ์ ๊ฐ์ |
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Regression"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
def predict_score(text: str):
"""
Returns:
- emotion_score: ๊ฐ์ ๊ฐ๋ (-1.0 ~ 1.0)
"""
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
padding=True,
max_length=512
).to(device)
with torch.no_grad():
outputs = model(**inputs).logits
score = outputs.item()
return {"text": text, "emotion_score": score}
# Example
result = predict_score("์ค๋์ ์ ๋ง ์ฆ๊ฒ๊ณ ํ๋ณตํ ์ต๊ณ ์ ํ๋ฃจ์์ด!")
print(result)
pipeline
from transformers import pipeline
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Regression"
regressor = pipeline(
"text-classification", # Regression๋ ๋์ผ Task๋ก ๋์
model=MODEL_NAME,
tokenizer=MODEL_NAME,
function_to_apply="none" # Softmax ์ ๊ฑฐ โ raw value ๊ทธ๋๋ก ์ฌ์ฉ
)
text = "์ค๋์ ์ ๋ง ์ฆ๊ฒ๊ณ ํ๋ณตํ ์ต๊ณ ์ ํ๋ฃจ์์ด!"
result = regressor(text)[0]
print("์
๋ ฅ ๋ฌธ์ฅ:", text)
print("๊ฐ์ ์ค์ฝ์ด:", result["score"])
Training Details
Training Data
- Total(9:1๋ก ๋ถํ ): 42,000ํ
- Train: 37,800ํ
- Validation: 4,200ํ
Training Procedure
- Base Model: monologg/koelectra-base-v3-discriminator
- Max Length: 512
Training Hyperparameters
- num_train_epochs: 4
- learning_rate: 1.8e-5
- weight_decay: 0.01
- warmup_ratio: 0.12
- per_device_train_batch_size: 32
- per_device_eval_batch_size: 32
- loss_function: Huber Loss (ฮด = 1.0)
Performance
| Metric | Score |
|---|---|
| Eval MAE | 0.0461 |
| Eval Pearson Correlation | 0.9951 |
| Eval Loss | 0.00199 |
Model Architecture
1) ELECTRA Encoder (Base-size)
- Hidden size: 768
- Layers: 12 Transformer blocks
- Attention heads: 12
- MLP intermediate size: 3072
- Activation: GELU
- Dropout: 0.1
2) Classification Head
๊ฐ์ ๊ฐ๋(โ1.0 ~ 1.0)๋ฅผ ์ถ๋ ฅํ๋ ๋จ์ผ ํ๊ท ํค๋:
- Dense Layer: 768 โ 768
- Activation: GELU
- Dropout: 0.1
- Output Projection: 768 โ 1
์ต์ข
์ถ๋ ฅ ๊ฐ์ Softmax ์์ด logits ๊ทธ๋๋ก ์ฌ์ฉํ๋ฉฐ,
โ1.0 ~ 1.0 ๋ฒ์ ๊ฐ์ ๊ฐ๋๋ก ํด์๋ฉ๋๋ค.
Citation
@misc{HowRUEmotionRegression2025,
title={HowRU KoELECTRA Emotion Regression},
author={Lim, Yeri},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Regression}}
}