Update README.md

ca54b7c verified 29 days ago

4.5 kB

	---
	library_name: transformers
	tags:
	- text-regression
	- emotion-regression
	- sentiment-regression
	- ko
	- korean
	- koelectra
	- emotion-analysis
	- nlp
	license: mit
	language:
	- ko
	base_model:
	- monologg/koelectra-base-v3-discriminator
	---

	# HowRU-KoELECTRA-Emotion-Regression

	## Model Description
	KoELECTRA 기반의 한국어(특히 일기/심리 기록) 감정 스코어링(Regression) 모델입니다.<br>
	텍스트 속 감정의 강도와 방향성(긍정 ↔ 부정) 을 –1.0 ~ 1.0 실수값으로 예측합니다.

	- Model type: Regression (Emotion Intensity / Sentiment Strength)
	- Output Range: -1.0 ~ 1.0
	- Language: Korean (한국어, ko)
	- License: MIT
	- Finetuned from model: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)

	---

	## Emotion Score Interpretation
	모델은 입력된 한국어 문장의 감정 강도를 아래 범위 중 하나로 산출합니다.

	\| Score Range \| Meaning \|
	\|------------------\|---------------------------------\|
	\| +0.6 ~ +1.0 \| 강한 긍정 감정 \|
	\| +0.2 ~ +0.6 \| 약한 긍정 감정 \|
	\| -0.2 ~ +0.2 \| 중립 또는 감정 표현이 미약 \|
	\| -0.6 ~ -0.2 \| 약한 부정 감정 \|
	\| -1.0 ~ -0.6 \| 강한 부정 감정 \|

	---

	## How to Get Started with the Model
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Regression"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
	model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	model.eval()

	def predict_score(text: str):
	"""
	Returns:
	- emotion_score: 감정 강도 (-1.0 ~ 1.0)
	"""
	inputs = tokenizer(
	text,
	return_tensors="pt",
	truncation=True,
	padding=True,
	max_length=512
	).to(device)

	with torch.no_grad():
	outputs = model(**inputs).logits
	score = outputs.item()

	return {"text": text, "emotion_score": score}


	# Example
	result = predict_score("오늘은 정말 즐겁고 행복한 최고의 하루였어!")
	print(result)
	```

	### pipeline
	```python
	from transformers import pipeline

	MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Regression"

	regressor = pipeline(
	"text-classification", # Regression도 동일 Task로 동작
	model=MODEL_NAME,
	tokenizer=MODEL_NAME,
	function_to_apply="none" # Softmax 제거 → raw value 그대로 사용
	)

	text = "오늘은 정말 즐겁고 행복한 최고의 하루였어!"
	result = regressor(text)[0]

	print("입력 문장:", text)
	print("감정 스코어:", result["score"])
	```

	---

	## Training Details

	### Training Data
	- Total(9:1로 분할): 42,000행
	- Train: 37,800행
	- Validation: 4,200행

	### Training Procedure
	- Base Model: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)
	- Max Length: 512

	### Training Hyperparameters
	- num_train_epochs: 4
	- learning_rate: 1.8e-5
	- weight_decay: 0.01
	- warmup_ratio: 0.12
	- per_device_train_batch_size: 32
	- per_device_eval_batch_size: 32
	- loss_function: Huber Loss (δ = 1.0)

	---

	## Performance
	\| Metric \| Score \|
	\|---------------------------\|------------\|
	\| Eval MAE \| 0.0461 \|
	\| Eval Pearson Correlation \| 0.9951 \|
	\| Eval Loss \| 0.00199 \|

	---
	## Model Architecture

	### 1) ELECTRA Encoder (Base-size)
	- Hidden size: 768
	- Layers: 12 Transformer blocks
	- Attention heads: 12
	- MLP intermediate size: 3072
	- Activation: GELU
	- Dropout: 0.1

	### 2) Classification Head
	감정 강도(–1.0 ~ 1.0)를 출력하는 단일 회귀 헤드:

	- Dense Layer: 768 → 768
	- Activation: GELU
	- Dropout: 0.1
	- Output Projection: 768 → 1

	최종 출력 값은 Softmax 없이 logits 그대로 사용하며,
	–1.0 ~ 1.0 범위 감정 강도로 해석됩니다.

	---

	## Citation
	```bibtex
	@misc{HowRUEmotionRegression2025,
	title={HowRU KoELECTRA Emotion Regression},
	author={Lim, Yeri},
	year={2025},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Regression}}
	}
	```