Text Classification
Transformers
Safetensors
Korean
bert
klue
korean
minwon
complaint
public-administration
text-embeddings-inference
Instructions to use atti433/minde-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use atti433/minde-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="atti433/minde-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier") model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier") - Notebooks
- Google Colab
- Kaggle
File size: 2,451 Bytes
2b2fbde | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ---
language:
- ko
license: other
library_name: transformers
pipeline_tag: text-classification
base_model: klue/bert-base
tags:
- bert
- klue
- korean
- text-classification
- minwon
- complaint
- public-administration
---
# MindE ๋ฏผ์ ๋ถ๋ฅ๊ธฐ (bert-v9)
ํ๊ตญ ๊ณต๊ณต ๋ฏผ์์ **11๊ฐ ์นดํ
๊ณ ๋ฆฌ**๋ก ์๋ ๋ถ๋ฅํ๋ KLUE BERT ๊ธฐ๋ฐ ๋ชจ๋ธ.
## ์นดํ
๊ณ ๋ฆฌ (11)
| ID | ์นดํ
๊ณ ๋ฆฌ | per-class F1 |
|---:|---|---:|
| 1 | ๊ตํต | 0.882 |
| 2 | ๊ฑด์ถ | 0.755 |
| 3 | ํ์ | 0.812 |
| 4 | ๋ณด๊ฑด์์ | 0.911 |
| 5 | ํ๊ฒฝ | 0.874 |
| 6 | ๋ฌธํ_์ฌ๊ฐ | 0.825 |
| 7 | ๋์ถ์ฐ | 0.909 |
| 8 | ๋ณต์ง | 0.866 |
| 9 | ์ธ๋ฌด | 0.974 |
| 10 | ์ํ์๋ | 0.921 |
| 11 | ๊ฒฝ์ | 0.874 |
**Test set (20,788๊ฑด)**
- Accuracy: **0.871**
- Macro F1: **0.873**
- Weighted F1: 0.871
## ํ์ต ๋ฐ์ดํฐ
- AI Hub 143๋ฒ "๋ฏผ์ ์
๋ฌด ํจ์จ, ์๋ํ๋ฅผ ์ํ ์ธ์ด AI ํ์ต๋ฐ์ดํฐ" (~86๋ง ๊ฑด, 18 ์นดํ
๊ณ ๋ฆฌ โ 11 ๋งคํ)
- group_id ๋จ์ 8:1:1 ๋ถํ + ์นดํ
๊ณ ๋ฆฌ๋น train 20k cap
- ๋ง์คํน ํ ํฐ(`#@์ฃผ์#` ๋ฑ) โ special token(`[ADDR]` ๋ฑ) ์นํ
## ํ์ต ์ค์
- Base: `klue/bert-base`
- max_length: 128
- batch_size: 32
- epochs: 3
- learning_rate: 2e-5
- warmup_ratio: 0.1
- weight_decay: 0.01
- ํ์ต ์๊ฐ: ~45๋ถ (RTX 4060 Ti)
## ์ฌ์ฉ ์์
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier")
model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier")
text = "์ง ์์ ์ฐจ๊ฐ ์๊พธ ๋ถ๋ฒ์ฃผ์ฐจํด์ ๋๋ฌด ๋ถํธํฉ๋๋ค."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
labels = ['๊ตํต','๊ฑด์ถ','ํ์ ','๋ณด๊ฑด์์','ํ๊ฒฝ','๋ฌธํ_์ฌ๊ฐ','๋์ถ์ฐ','๋ณต์ง','์ธ๋ฌด','์ํ์๋','๊ฒฝ์ ']
pred = labels[probs.argmax().item()]
print(pred, probs.max().item())
```
๋๋ ๋ณธ ํ๋ก์ ํธ์ `chatbot_service.classify_complaint()` ์ฌ์ฉ.
## ํ๊ณ
- ํ์ต ๋ฐ์ดํฐ(AI Hub 143)๋ ์ฐฝ์์ ๋ฏผ์ ์ค์ฌ์ด๋ผ ์ง์ญ ์ดํ ํธํฅ ๊ฐ๋ฅ
- "๊ฑด์ถ" ์นดํ
๊ณ ๋ฆฌ F1 0.755๊ฐ ๊ฐ์ฅ ๋ฎ์ โ ์์ ๊ฑด์ค๊ณผ raw_category์ ๋๋ก/์์ค ๋ฏผ์์ด ์์ฌ์๋ ๋ผ๋ฒจ ๋
ธ์ด์ฆ ์ํฅ
- ๋์์ด์/์งง์ ํ
์คํธ(์: "์ ํธ๋ฑ")๋ confidence ๋ฎ์. top-3๋ก ๋ฐ์์ LLM์ด ํ๋จ ๊ถ์ฅ
|