|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-regression |
|
|
- emotion-regression |
|
|
- sentiment-regression |
|
|
- ko |
|
|
- korean |
|
|
- koelectra |
|
|
- emotion-analysis |
|
|
- nlp |
|
|
license: mit |
|
|
language: |
|
|
- ko |
|
|
base_model: |
|
|
- monologg/koelectra-base-v3-discriminator |
|
|
--- |
|
|
|
|
|
# HowRU-KoELECTRA-Emotion-Regression |
|
|
|
|
|
## Model Description |
|
|
KoELECTRA ๊ธฐ๋ฐ์ ํ๊ตญ์ด(ํนํ ์ผ๊ธฐ/์ฌ๋ฆฌ ๊ธฐ๋ก) ๊ฐ์ ์ค์ฝ์ด๋ง(Regression) ๋ชจ๋ธ์
๋๋ค.<br> |
|
|
ํ
์คํธ ์ ๊ฐ์ ์ *๊ฐ๋์ ๋ฐฉํฅ์ฑ(๊ธ์ โ ๋ถ์ )* ์ **โ1.0 ~ 1.0 ์ค์๊ฐ**์ผ๋ก ์์ธกํฉ๋๋ค. |
|
|
|
|
|
- **Model type:** Regression (Emotion Intensity / Sentiment Strength) |
|
|
- **Output Range:** -1.0 ~ 1.0 |
|
|
- **Language:** Korean (ํ๊ตญ์ด, ko) |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator) |
|
|
|
|
|
--- |
|
|
|
|
|
## Emotion Score Interpretation |
|
|
๋ชจ๋ธ์ ์
๋ ฅ๋ ํ๊ตญ์ด ๋ฌธ์ฅ์ ๊ฐ์ ๊ฐ๋๋ฅผ ์๋ ๋ฒ์ ์ค ํ๋๋ก ์ฐ์ถํฉ๋๋ค. |
|
|
|
|
|
| Score Range | Meaning | |
|
|
|------------------|---------------------------------| |
|
|
| **+0.6 ~ +1.0** | ๊ฐํ ๊ธ์ ๊ฐ์ | |
|
|
| **+0.2 ~ +0.6** | ์ฝํ ๊ธ์ ๊ฐ์ | |
|
|
| **-0.2 ~ +0.2** | ์ค๋ฆฝ ๋๋ ๊ฐ์ ํํ์ด ๋ฏธ์ฝ | |
|
|
| **-0.6 ~ -0.2** | ์ฝํ ๋ถ์ ๊ฐ์ | |
|
|
| **-1.0 ~ -0.6** | ๊ฐํ ๋ถ์ ๊ฐ์ | |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started with the Model |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Regression" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME) |
|
|
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
model.to(device) |
|
|
model.eval() |
|
|
|
|
|
def predict_score(text: str): |
|
|
""" |
|
|
Returns: |
|
|
- emotion_score: ๊ฐ์ ๊ฐ๋ (-1.0 ~ 1.0) |
|
|
""" |
|
|
inputs = tokenizer( |
|
|
text, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
padding=True, |
|
|
max_length=512 |
|
|
).to(device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs).logits |
|
|
score = outputs.item() |
|
|
|
|
|
return {"text": text, "emotion_score": score} |
|
|
|
|
|
|
|
|
# Example |
|
|
result = predict_score("์ค๋์ ์ ๋ง ์ฆ๊ฒ๊ณ ํ๋ณตํ ์ต๊ณ ์ ํ๋ฃจ์์ด!") |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
### pipeline |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Regression" |
|
|
|
|
|
regressor = pipeline( |
|
|
"text-classification", # Regression๋ ๋์ผ Task๋ก ๋์ |
|
|
model=MODEL_NAME, |
|
|
tokenizer=MODEL_NAME, |
|
|
function_to_apply="none" # Softmax ์ ๊ฑฐ โ raw value ๊ทธ๋๋ก ์ฌ์ฉ |
|
|
) |
|
|
|
|
|
text = "์ค๋์ ์ ๋ง ์ฆ๊ฒ๊ณ ํ๋ณตํ ์ต๊ณ ์ ํ๋ฃจ์์ด!" |
|
|
result = regressor(text)[0] |
|
|
|
|
|
print("์
๋ ฅ ๋ฌธ์ฅ:", text) |
|
|
print("๊ฐ์ ์ค์ฝ์ด:", result["score"]) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- **Total(9:1๋ก ๋ถํ ):** 42,000ํ |
|
|
- **Train:** 37,800ํ |
|
|
- **Validation:** 4,200ํ |
|
|
|
|
|
### Training Procedure |
|
|
- **Base Model**: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator) |
|
|
- **Max Length**: 512 |
|
|
|
|
|
### Training Hyperparameters |
|
|
- **num_train_epochs**: 4 |
|
|
- **learning_rate**: 1.8e-5 |
|
|
- **weight_decay**: 0.01 |
|
|
- **warmup_ratio**: 0.12 |
|
|
- **per_device_train_batch_size**: 32 |
|
|
- **per_device_eval_batch_size**: 32 |
|
|
- **loss_function**: Huber Loss (ฮด = 1.0) |
|
|
|
|
|
--- |
|
|
|
|
|
## Performance |
|
|
| Metric | Score | |
|
|
|---------------------------|------------| |
|
|
| **Eval MAE** | **0.0461** | |
|
|
| **Eval Pearson Correlation** | **0.9951** | |
|
|
| **Eval Loss** | **0.00199** | |
|
|
|
|
|
--- |
|
|
## Model Architecture |
|
|
|
|
|
### 1) ELECTRA Encoder (Base-size) |
|
|
- **Hidden size:** 768 |
|
|
- **Layers:** 12 Transformer blocks |
|
|
- **Attention heads:** 12 |
|
|
- **MLP intermediate size:** 3072 |
|
|
- **Activation:** GELU |
|
|
- **Dropout:** 0.1 |
|
|
|
|
|
### 2) Classification Head |
|
|
๊ฐ์ ๊ฐ๋(โ1.0 ~ 1.0)๋ฅผ ์ถ๋ ฅํ๋ ๋จ์ผ ํ๊ท ํค๋: |
|
|
|
|
|
- **Dense Layer**: 768 โ 768 |
|
|
- **Activation**: GELU |
|
|
- **Dropout**: 0.1 |
|
|
- **Output Projection**: 768 โ 1 |
|
|
|
|
|
์ต์ข
์ถ๋ ฅ ๊ฐ์ Softmax ์์ด **logits ๊ทธ๋๋ก** ์ฌ์ฉํ๋ฉฐ, |
|
|
โ1.0 ~ 1.0 ๋ฒ์ ๊ฐ์ ๊ฐ๋๋ก ํด์๋ฉ๋๋ค. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@misc{HowRUEmotionRegression2025, |
|
|
title={HowRU KoELECTRA Emotion Regression}, |
|
|
author={Lim, Yeri}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Regression}} |
|
|
} |
|
|
``` |