Update README.md

0725960 verified 26 days ago

5.25 kB

	---
	library_name: transformers
	tags:
	- korean
	- emotion
	- emotion-classification
	- nlp
	- electra
	- koelectra
	- sentiment
	- sequence-classification
	license: mit
	datasets:
	- LimYeri/kor-diary-emotion_v2
	- qowlsdud/CounselGPT
	language:
	- ko
	metrics:
	- accuracy
	- f1
	base_model:
	- monologg/koelectra-base-v3-discriminator
	pipeline_tag: text-classification
	---

	# HowRU-KoELECTRA-Emotion-Classifier

	## Model Description
	KoELECTRA 기반의 한국어(특히 일기/심리 기록) 감정 분류 모델입니다.<br>
	텍스트에서 8가지 감정(기쁨, 설렘, 평범함, 놀라움, 불쾌함, 두려움, 슬픔, 분노)을 인식합니다.

	- Model type: Text Classification (Emotion Recognition)
	- Language: Korean (한국어, ko)
	- License: MIT
	- Finetuned from model: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)

	## Emotion Classes
	이 모델은 입력된 한국어 문장의 주요 감정을 아래 8개 클래스 중 하나로 분류합니다.
	\| Emotion (Korean) \| Emotion (EN) \|
	\|------------------\|--------------\|
	\| 기쁨 \| Joy \|
	\| 설렘 \| Excitement \|
	\| 평범함 \| Neutral \|
	\| 놀라움 \| Surprise \|
	\| 불쾌함 \| Disgust \|
	\| 두려움 \| Fear \|
	\| 슬픔 \| Sadness \|
	\| 분노 \| Anger \|

	---

	## How to Get Started with the Model
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch
	import torch.nn.functional as F

	# 1) Load Model & Tokenizer
	MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
	model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

	# GPU 사용 가능 시 자동 전환
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	model.eval()

	# 감정 라벨 매핑 (id2label)
	id2label = model.config.id2label


	# 2) Inference Function
	def predict_emotion(text: str):
	"""
	Returns:
	- top1_pred: 예측된 감정 라벨
	- probs_sorted: 감정별 확률(내림차순)
	- top2_pred: 상위 두 개의 감정
	"""

	# 토크나이징
	inputs = tokenizer(
	text,
	return_tensors="pt",
	truncation=True,
	padding=True,
	max_length=512
	).to(device)

	# 추론
	with torch.no_grad():
	logits = model(**inputs).logits
	probs = F.softmax(logits, dim=-1)[0]

	# 정렬된 확률
	probs_sorted = sorted(
	[(id2label[i], float(probs[i])) for i in range(len(probs))],
	key=lambda x: x[1],
	reverse=True
	)

	top1_pred = probs_sorted[0]
	top2_pred = probs_sorted[:2]

	return {
	"text": text,
	"top1_emotion": top1_pred,
	"top2_emotions": top2_pred,
	"all_probabilities": probs_sorted,
	}


	# 3) Example
	result = predict_emotion("오늘 정말 기분이 좋고 행복한 하루였어!")
	print(result)
	```

	### pipeline
	```python
	from transformers import pipeline

	MODEL_NAME = "LimYeri/HowRU-KoELECTRA-Emotion-Classifier"

	classifier = pipeline(
	"text-classification",
	model=MODEL_NAME,
	tokenizer=MODEL_NAME,
	top_k=None # 전체 감정 확률 반환
	)

	# 예측
	text = "오늘 정말 기분이 좋고 행복한 하루였어!"
	result = classifier(text)

	result = result[0]

	print("입력 문장:", text)
	print("\nTop-1 감정:", result[0]['label'], f"({result[0]['score']:.4f})")
	print("\n전체 감정 분포:")
	for r in result:
	print(f" {r['label']}: {r['score']:.4f}")
	```

	---

	## Training Details

	### Training Data
	1. [LimYeri/kor-diary-emotion_v2](https://huggingface.co/datasets/LimYeri/kor-diary-emotion_v2)
	2. [qowlsdud/CounselGPT](https://huggingface.co/datasets/qowlsdud/CounselGPT)

	- Total(8:2로 분할): 50,000행
	- Train: 40,000행
	- Validation: 10,000행

	### Training Procedure
	- Base Model: [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator)
	- Objective: Single-label classification
	- Max Length: 512

	### Training Hyperparameters
	- num_train_epochs: 3
	- learning_rate: 3e-5
	- weight_decay: 0.02
	- warmup_ratio: 0.15
	- per_device_train_batch_size: 32
	- per_device_eval_batch_size: 64
	- max_grad_norm: 1.0

	---

	## Performance
	\| Metric \| Score \|
	\|-----------------\|--------\|
	\| Eval Accuracy \| 0.95 \|
	\| Eval F1 Macro \| 0.95 \|
	\| Eval Loss \| 0.16 \|

	---
	## Model Architecture

	### 1) ELECTRA Encoder (Base-size)
	- Hidden size: 768
	- Layers: 12 Transformer blocks
	- Attention heads: 12
	- MLP intermediate size: 3072
	- Activation: GELU
	- Dropout: 0.1

	### 2) Classification Head
	감정 8개 클래스를 예측하기 위한 추가 분류 헤드:

	- Dense Layer: 768 → 768
	- Activation: GELU
	- Dropout: 0.1
	- Output Projection: 768 → 8

	---

	## Citation
	```bibtex
	@misc{HowRUEmotion2025,
	title={HowRU KoELECTRA Emotion Classifier},
	author={Lim, Yeri},
	year={2025},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/LimYeri/HowRU-KoELECTRA-Emotion-Classifier}}
	}
	```