kakao1513's picture
Update README.md
d22e170 verified
---
library_name: transformers
license: apache-2.0
base_model: monologg/koelectra-base-v3-discriminator
tags:
- intent-classification
- korean
- koelectra
- generated_from_trainer
metrics:
- accuracy
- f1
model-index:
- name: koelectra_intent_model
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# koelectra_intent_model
This model is a fine-tuned version of [monologg/koelectra-base-v3-discriminator](https://huggingface.co/monologg/koelectra-base-v3-discriminator) on the custom-intent-dataset dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9360
- Accuracy: 0.9885
- F1: 0.9884
## Model description
# ๐Ÿ“‹ ๋ชจ๋ธ ์นด๋“œ (Model Card)
## ๋ชจ๋ธ ์ •๋ณด
### ๊ธฐ๋ณธ ์ •๋ณด
- **๋ชจ๋ธ๋ช…**: Intent Classifier KoELECTRA Fine-tuned
- **๋ชจ๋ธ ID**: `kakao1513/koelectra_intent_model`
- **๊ธฐ๋ณธ ๋ชจ๋ธ**: `monologg/koelectra-base-v3-discriminator`
- **์ž‘์—…**: ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ (Text Classification)
- **์–ธ์–ด**: ํ•œ๊ตญ์–ด (Korean)
### ๋ชจ๋ธ ๊ฐœ์š”
์ด ๋ชจ๋ธ์€ ์‚ฌ์šฉ์ž์˜ ์˜๋„๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด KoELECTRA ๋ชจ๋ธ์„ ํ•œ๊ตญ์–ด ์˜๋„ ๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •(fine-tuning)ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‡ผํ•‘๋ชฐ, ํšŒ์›๊ฐ€์ž…, ๋กœ๊ทธ์ธ ๋“ฑ ์›น ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์‚ฌ์šฉ์ž ํ–‰๋™ ์˜๋„๋ฅผ 35๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
---
## ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ
### ๋ฐ์ดํ„ฐ์…‹ ํ†ต๊ณ„
| ํ•ญ๋ชฉ | ๊ฐ’ |
|------|-----|
| **์ด ๋ฐ์ดํ„ฐ ์ˆ˜** | 7,084 |
| **ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ** | 5,698 (80%) |
| **ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ** | 1,386 (20%) |
| **์˜๋„ ํด๋ž˜์Šค ์ˆ˜** | 35๊ฐœ |
### ์ฃผ์š” ์˜๋„ ํด๋ž˜์Šค (์˜ˆ์‹œ)
| ์˜๋„ | ์„ค๋ช… | ์ƒ˜ํ”Œ ์ˆ˜ |
|------|------|--------|
| `unknown` | ๋ฌด๊ด€/์ผ์ƒ์žก๋‹ด | 748 |
| `go_mall` | ์‡ผํ•‘๋ชฐ๋กœ ์ด๋™ | 220 |
| `go_coupang` | ์ฟ ํŒก์œผ๋กœ ์ด๋™ | 220 |
| `click_login` | ๋กœ๊ทธ์ธ | 220 |
| `click_signup` | ํšŒ์›๊ฐ€์ž… ํด๋ฆญ | 220 |
| ... | ๊ทธ ์™ธ 30๊ฐœ ์˜๋„ | - |
---
## ํ›ˆ๋ จ ์„ค์ •
### ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ
```python
ํ•™์Šต๋ฅ : 2e-5
๋ฐฐ์น˜ ํฌ๊ธฐ: 32
์—ํฌํฌ: 5
์ตœ๋Œ€ ์‹œํ€€์Šค ๊ธธ์ด: 64
๊ฐ€์ค‘์น˜ ๊ฐ์†Œ: 0.01
๋ผ๋ฒจ ์Šค๋ฌด๋”ฉ: 0.1
์˜ตํ‹ฐ๋งˆ์ด์ €: AdamW
```
### ํ›ˆ๋ จ ๊ฒฐ๊ณผ
| Epoch | Validation Loss | Accuracy | F1 Score |
|-------|-----------------|----------|----------|
| 1 | 2.651761 | 71.63% | 0.6689 |
| 2 | 1.768677 | 92.35% | 0.9065 |
| 3 | 1.241083 | 97.99% | 0.9797 |
| 4 | 0.999594 | 98.91% | 0.9890 |
| 5 | 0.936003 | 98.85% | 0.9884 |
**์ตœ์ข… ์„ฑ๋Šฅ (ํ…Œ์ŠคํŠธ ์…‹)**
- **์ •ํ™•๋„ (Accuracy)**: 98.85%
- **F1 ์ ์ˆ˜ (Weighted)**: 0.9884
---
## ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
### ์„ค์น˜
```bash
pip install transformers torch
```
### ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•
```python
from transformers import pipeline
# ๋ชจ๋ธ ๋กœ๋“œ
classifier = pipeline("text-classification",
model="smj1513/intent-classifier-koElectra-finetuned")
# ์˜ˆ์ธก ์‹คํ–‰
text = "์‡ผํ•‘๋ชฐ ์‚ฌ์ดํŠธ๋กœ ์ด๋™ ํ• ๊นŒ ๋ง๊นŒ ํ• ๊ฒŒ"
result = classifier(text)[0]
print(f"์˜๋„: {result['label']}")
print(f"ํ™•์‹ ๋„: {result['score']:.4f}")
```
### ์ƒ์„ธ ์‚ฌ์šฉ๋ฒ•
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# ๋ชจ๋ธ ๋ฐ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
model_name = "smj1513/intent-classifier-koElectra-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ
text = "๋กœ๊ทธ์ธ ํŽ˜์ด์ง€๋กœ ๊ฐ€์ค˜"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
# ์˜ˆ์ธก
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_id = logits.argmax().item()
confidence = torch.softmax(logits, dim=-1)[0][predicted_class_id].item()
print(f"์˜ˆ์ธก ํด๋ž˜์Šค: {model.config.id2label[predicted_class_id]}")
print(f"์‹ ๋ขฐ๋„: {confidence:.4f}")
```
---
## ์„ฑ๋Šฅ ๋ถ„์„
### ๊ฐ•์ 
โœ… **๋†’์€ ์ •ํ™•๋„**: 98.85%์˜ ํ…Œ์ŠคํŠธ ์ •ํ™•๋„๋กœ ๋งค์šฐ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ
โœ… **๊ท ํ˜•์žกํžŒ F1 ์ ์ˆ˜**: 0.9884์˜ F1 ์ ์ˆ˜๋กœ ์ •๋ฐ€๋„์™€ ์žฌํ˜„์œจ์˜ ๊ท ํ˜• ์œ ์ง€
โœ… **๋น ๋ฅธ ์ถ”๋ก **: GPU์—์„œ ์•ฝ 94๊ฐœ/์ดˆ์˜ ์ฒ˜๋ฆฌ ์†๋„
โœ… **ํ•œ๊ตญ์–ด ํŠนํ™”**: KoELECTRA๋ฅผ ์‚ฌ์šฉํ•œ ํšจ์œจ์ ์ธ ํ•œ๊ตญ์–ด ์ฒ˜๋ฆฌ
### ์ฃผ์˜์‚ฌํ•ญ
โš ๏ธ **๋„๋ฉ”์ธ ํŠนํ™”**: ์‡ผํ•‘๋ชฐ/ํšŒ์›๊ด€๋ฆฌ ๋„๋ฉ”์ธ์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์Œ
โš ๏ธ **ํ† ํฐ ๊ธธ์ด ์ œํ•œ**: ์ตœ๋Œ€ 64 ํ† ํฐ์œผ๋กœ ์ œํ•œ (๊ธด ๋ฌธ์žฅ์€ ํ™œ์šฉ ์ œํ•œ์ )
โš ๏ธ **๋ฏธ์ง€ ์˜๋„**: `unknown` ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋˜๋Š” ์ผ์ƒ ์žก๋‹ด์ด ํฌํ•จ๋จ
---
## ๊ธฐ์ˆ  ์‚ฌํ•ญ
### ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜
- **๋ชจ๋ธ ํฌ๊ธฐ**: ELECTRA Base
- **ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜**: ~110M
- **์ถœ๋ ฅ ๋ ˆ์ด์–ด**: ์„ ํ˜• ๋ถ„๋ฅ˜ ํ—ค๋“œ (35๊ฐœ ํด๋ž˜์Šค)
### ์ž…์ถœ๋ ฅ ๋ช…์„ธ
- **์ž…๋ ฅ**: ์ตœ๋Œ€ 64 ํ† ํฐ ๊ธธ์ด์˜ ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ
- **์ถœ๋ ฅ**: 35๊ฐœ ์˜๋„ ํด๋ž˜์Šค ์ค‘ ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€ ํด๋ž˜์Šค ๋ฐ ์‹ ๋ขฐ๋„
---
## ์ œํ•œ์‚ฌํ•ญ ๋ฐ ๊ถŒ์žฅ์‚ฌํ•ญ
### ์ ์šฉ ๊ฐ€๋Šฅ ๋„๋ฉ”์ธ
- โœ… ์‡ผํ•‘๋ชฐ/์ „์ž์ƒ๊ฑฐ๋ž˜ ์‹œ์Šคํ…œ
- โœ… ํšŒ์›๊ฐ€์ž…/๋กœ๊ทธ์ธ ์˜๋„ ๋ถ„๋ฅ˜
- โœ… ์›น/๋ชจ๋ฐ”์ผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์‚ฌ์šฉ์ž ๋ช…๋ น
### ๋ถ€์ ์ ˆํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€
- โŒ ์˜๋ฃŒ, ๋ฒ•๋ฅ  ๋“ฑ ๊ณ ์œ„ํ—˜ ๋„๋ฉ”์ธ
- โŒ ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ธ์‹ (์ด ๋ชจ๋ธ์€ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜)
- โŒ ๋‹ค๋ฅธ ์–ธ์–ด ๋˜๋Š” ๋„๋ฉ”์ธ์˜ ์˜๋„ ๋ถ„๋ฅ˜
### ์„ฑ๋Šฅ ๊ฐœ์„  ํŒ
1. **๋งฅ๋ฝ ์ถ”๊ฐ€**: ๊ธด ๋ฌธ์žฅ์€ ์š”์•ฝํ•˜์—ฌ 64ํ† ํฐ ์ด๋‚ด๋กœ ์œ ์ง€
2. **ํ›„์ฒ˜๋ฆฌ**: ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ์€ ๊ฒฝ์šฐ(< 0.7) ์‚ฌ๋žŒ์˜ ๊ฒ€ํ†  ๊ถŒ์žฅ
3. **์žฌํ›ˆ๋ จ**: ์ƒˆ๋กœ์šด ์˜๋„ ํด๋ž˜์Šค ์ถ”๊ฐ€ ์‹œ ๋ชจ๋ธ ์žฌํ›ˆ๋ จ
---
## ๋ผ์ด์„ ์Šค ๋ฐ ์ถœ์ฒ˜
- **๊ธฐ๋ณธ ๋ชจ๋ธ ๋ผ์ด์„ ์Šค**: MIT (KoELECTRA)
- **๋ชจ๋ธ ๊ณต๊ฐœ**: Hugging Face Model Hub
- **์‚ฌ์šฉ ๋ผ์ด์„ ์Šค**: MIT
---
## ์ธ์šฉ ์ •๋ณด
์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ธ์šฉํ•ด์ฃผ์„ธ์š”:
```bibtex
@misc{intent-classifier-koelectra,
author = {Your Name},
title = {Intent Classifier KoELECTRA Fine-tuned},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/smj1513/intent-classifier-koElectra-finetuned}
}
```
---
## ์—ฐ๋ฝ์ฒ˜ ๋ฐ ์ง€์›
๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ํ”ผ๋“œ๋ฐฑ์ด ์žˆ์œผ์‹œ๋ฉด Hugging Face ๋ชจ๋ธ ํŽ˜์ด์ง€์—์„œ Issues๋ฅผ ์ œ์ถœํ•ด์ฃผ์„ธ์š”.
**๋งˆ์ง€๋ง‰ ์—…๋ฐ์ดํŠธ**: 2026๋…„ 2์›” 11์ผ
### Framework versions
- Transformers 5.1.0
- Pytorch 2.9.1+cu128
- Datasets 4.5.0
- Tokenizers 0.22.2