---
library_name: transformers
tags:
- text-regression
- emotion-regression
- sentiment-regression
- ko
- korean
- koelectra
- emotion-analysis
- nlp
license: mit
language:
- ko
base_model:
- monologg/koelectra-base-v3-discriminator
---

# HowRU-KoELECTRA-Emotion-Regression

## Model Description
KoELECTRA 기반의 한국어(특히 일기/심리 기록) 감정 스코어링(Regression) 모델입니다.<br>
텍스트 속 감정의 *강도와 방향성(긍정 ↔ 부정)* 을 **–1.0 ~ 1.0 실수값**으로 예측합니다.  

- **Model type:** Regression (Emotion Intensity / Sentiment Strength)
- **Output Range:** -1.0 ~ 1.0
- **Language:** Korean (한국어, ko)
- **License:** MIT
- **Finetuned from model:** [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)

---

## Emotion Score Interpretation
모델은 입력된 한국어 문장의 감정 강도를 아래 범위 중 하나로 산출합니다.

| Score Range      | Meaning                         |
|------------------|---------------------------------|
| **+0.6 ~ +1.0**    | 강한 긍정 감정                  |
| **+0.2 ~ +0.6**    | 약한 긍정 감정                      |
| **-0.2 ~ +0.2**    | 중립 또는 감정 표현이 미약          |
| **-0.6 ~ -0.2**    | 약한 부정 감정                      |
| **-1.0 ~ -0.6**    | 강한 부정 감정                  |

---

## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Regression"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

def predict_score(text: str):
    """
    Returns:
        - emotion_score: 감정 강도 (-1.0 ~ 1.0)
    """
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=512
    ).to(device)

    with torch.no_grad():
        outputs = model(**inputs).logits
        score = outputs.item()

    return {"text": text, "emotion_score": score}


# Example
result = predict_score("오늘은 정말 즐겁고 행복한 최고의 하루였어!")
print(result)
```

### pipeline
```python
from transformers import pipeline

MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Regression"

regressor = pipeline(
    "text-classification",   # Regression도 동일 Task로 동작
    model=MODEL_NAME,
    tokenizer=MODEL_NAME,
    function_to_apply="none"  # Softmax 제거 → raw value 그대로 사용
)

text = "오늘은 정말 즐겁고 행복한 최고의 하루였어!"
result = regressor(text)[0]

print("입력 문장:", text)
print("감정 스코어:", result["score"])
```

---

## Training Details

### Training Data
- **Total(9:1로 분할):** 42,000행
- **Train:** 37,800행
- **Validation:** 4,200행

### Training Procedure
- **Base Model**: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)
- **Max Length**: 512

### Training Hyperparameters
- **num_train_epochs**: 4
- **learning_rate**: 1.8e-5
- **weight_decay**: 0.01
- **warmup_ratio**: 0.12
- **per_device_train_batch_size**: 32
- **per_device_eval_batch_size**: 32
- **loss_function**: Huber Loss (δ = 1.0)

---

## Performance
| Metric                     | Score      |
|---------------------------|------------|
| **Eval MAE**              | **0.0461** |
| **Eval Pearson Correlation** | **0.9951** |
| **Eval Loss**             | **0.00199** |

---
## Model Architecture

### 1) ELECTRA Encoder (Base-size)
- **Hidden size:** 768
- **Layers:** 12 Transformer blocks
- **Attention heads:** 12
- **MLP intermediate size:** 3072
- **Activation:** GELU
- **Dropout:** 0.1

### 2) Classification Head
감정 강도(–1.0 ~ 1.0)를 출력하는 단일 회귀 헤드:

- **Dense Layer**: 768 → 768
- **Activation**: GELU
- **Dropout**: 0.1
- **Output Projection**: 768 → 1
  
최종 출력 값은 Softmax 없이 **logits 그대로** 사용하며,  
–1.0 ~ 1.0 범위 감정 강도로 해석됩니다.

---

## Citation
```bibtex
@misc{HowRUEmotionRegression2025,
  title={HowRU KoELECTRA Emotion Regression},
  author={Lim, Yeri},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Regression}}
}
```