Text Classification
Transformers
Safetensors
Korean
bert
klue
korean
minwon
complaint
public-administration
text-embeddings-inference
Instructions to use atti433/minde-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use atti433/minde-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="atti433/minde-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier") model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier") - Notebooks
- Google Colab
- Kaggle
Add model card
Browse files
README.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- ko
|
| 4 |
+
license: other
|
| 5 |
+
library_name: transformers
|
| 6 |
+
pipeline_tag: text-classification
|
| 7 |
+
base_model: klue/bert-base
|
| 8 |
+
tags:
|
| 9 |
+
- bert
|
| 10 |
+
- klue
|
| 11 |
+
- korean
|
| 12 |
+
- text-classification
|
| 13 |
+
- minwon
|
| 14 |
+
- complaint
|
| 15 |
+
- public-administration
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# MindE ๋ฏผ์ ๋ถ๋ฅ๊ธฐ (bert-v9)
|
| 19 |
+
|
| 20 |
+
ํ๊ตญ ๊ณต๊ณต ๋ฏผ์์ **11๊ฐ ์นดํ
๊ณ ๋ฆฌ**๋ก ์๋ ๋ถ๋ฅํ๋ KLUE BERT ๊ธฐ๋ฐ ๋ชจ๋ธ.
|
| 21 |
+
|
| 22 |
+
## ์นดํ
๊ณ ๋ฆฌ (11)
|
| 23 |
+
|
| 24 |
+
| ID | ์นดํ
๊ณ ๋ฆฌ | per-class F1 |
|
| 25 |
+
|---:|---|---:|
|
| 26 |
+
| 1 | ๊ตํต | 0.882 |
|
| 27 |
+
| 2 | ๊ฑด์ถ | 0.755 |
|
| 28 |
+
| 3 | ํ์ | 0.812 |
|
| 29 |
+
| 4 | ๋ณด๊ฑด์์ | 0.911 |
|
| 30 |
+
| 5 | ํ๊ฒฝ | 0.874 |
|
| 31 |
+
| 6 | ๋ฌธํ_์ฌ๊ฐ | 0.825 |
|
| 32 |
+
| 7 | ๋์ถ์ฐ | 0.909 |
|
| 33 |
+
| 8 | ๋ณต์ง | 0.866 |
|
| 34 |
+
| 9 | ์ธ๋ฌด | 0.974 |
|
| 35 |
+
| 10 | ์ํ์๋ | 0.921 |
|
| 36 |
+
| 11 | ๊ฒฝ์ | 0.874 |
|
| 37 |
+
|
| 38 |
+
**Test set (20,788๊ฑด)**
|
| 39 |
+
- Accuracy: **0.871**
|
| 40 |
+
- Macro F1: **0.873**
|
| 41 |
+
- Weighted F1: 0.871
|
| 42 |
+
|
| 43 |
+
## ํ์ต ๋ฐ์ดํฐ
|
| 44 |
+
|
| 45 |
+
- AI Hub 143๋ฒ "๋ฏผ์ ์
๋ฌด ํจ์จ, ์๋ํ๋ฅผ ์ํ ์ธ์ด AI ํ์ต๋ฐ์ดํฐ" (~86๋ง ๊ฑด, 18 ์นดํ
๊ณ ๋ฆฌ โ 11 ๋งคํ)
|
| 46 |
+
- group_id ๋จ์ 8:1:1 ๋ถํ + ์นดํ
๊ณ ๋ฆฌ๋น train 20k cap
|
| 47 |
+
- ๋ง์คํน ํ ํฐ(`#@์ฃผ์#` ๋ฑ) โ special token(`[ADDR]` ๋ฑ) ์นํ
|
| 48 |
+
|
| 49 |
+
## ํ์ต ์ค์
|
| 50 |
+
|
| 51 |
+
- Base: `klue/bert-base`
|
| 52 |
+
- max_length: 128
|
| 53 |
+
- batch_size: 32
|
| 54 |
+
- epochs: 3
|
| 55 |
+
- learning_rate: 2e-5
|
| 56 |
+
- warmup_ratio: 0.1
|
| 57 |
+
- weight_decay: 0.01
|
| 58 |
+
- ํ์ต ์๊ฐ: ~45๋ถ (RTX 4060 Ti)
|
| 59 |
+
|
| 60 |
+
## ์ฌ์ฉ ์์
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 64 |
+
import torch
|
| 65 |
+
|
| 66 |
+
tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier")
|
| 67 |
+
model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier")
|
| 68 |
+
|
| 69 |
+
text = "์ง ์์ ์ฐจ๊ฐ ์๊พธ ๋ถ๋ฒ์ฃผ์ฐจํด์ ๋๋ฌด ๋ถํธํฉ๋๋ค."
|
| 70 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
|
| 71 |
+
with torch.no_grad():
|
| 72 |
+
logits = model(**inputs).logits
|
| 73 |
+
probs = torch.softmax(logits, dim=-1)
|
| 74 |
+
labels = ['๊ตํต','๊ฑด์ถ','ํ์ ','๋ณด๊ฑด์์','ํ๊ฒฝ','๋ฌธํ_์ฌ๊ฐ','๋์ถ์ฐ','๋ณต์ง','์ธ๋ฌด','์ํ์๋','๊ฒฝ์ ']
|
| 75 |
+
pred = labels[probs.argmax().item()]
|
| 76 |
+
print(pred, probs.max().item())
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
๋๋ ๋ณธ ํ๋ก์ ํธ์ `chatbot_service.classify_complaint()` ์ฌ์ฉ.
|
| 80 |
+
|
| 81 |
+
## ํ๊ณ
|
| 82 |
+
|
| 83 |
+
- ํ์ต ๋ฐ์ดํฐ(AI Hub 143)๋ ์ฐฝ์์ ๋ฏผ์ ์ค์ฌ์ด๋ผ ์ง์ญ ์ดํ ํธํฅ ๊ฐ๋ฅ
|
| 84 |
+
- "๊ฑด์ถ" ์นดํ
๊ณ ๋ฆฌ F1 0.755๊ฐ ๊ฐ์ฅ ๋ฎ์ โ ์์ ๊ฑด์ค๊ณผ raw_category์ ๋๋ก/์์ค ๋ฏผ์์ด ์์ฌ์๋ ๋ผ๋ฒจ ๋
ธ์ด์ฆ ์ํฅ
|
| 85 |
+
- ๋์์ด์/์งง์ ํ
์คํธ(์: "์ ํธ๋ฑ")๋ confidence ๋ฎ์. top-3๋ก ๋ฐ์์ LLM์ด ํ๋จ ๊ถ์ฅ
|