Text Classification
Transformers
Safetensors
Korean
bert
klue
korean
minwon
complaint
public-administration
text-embeddings-inference
Instructions to use atti433/minde-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use atti433/minde-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="atti433/minde-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier") model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier")
model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier")Quick Links
MindE ๋ฏผ์ ๋ถ๋ฅ๊ธฐ (bert-v9)
ํ๊ตญ ๊ณต๊ณต ๋ฏผ์์ 11๊ฐ ์นดํ ๊ณ ๋ฆฌ๋ก ์๋ ๋ถ๋ฅํ๋ KLUE BERT ๊ธฐ๋ฐ ๋ชจ๋ธ.
์นดํ ๊ณ ๋ฆฌ (11)
| ID | ์นดํ ๊ณ ๋ฆฌ | per-class F1 |
|---|---|---|
| 1 | ๊ตํต | 0.882 |
| 2 | ๊ฑด์ถ | 0.755 |
| 3 | ํ์ | 0.812 |
| 4 | ๋ณด๊ฑด์์ | 0.911 |
| 5 | ํ๊ฒฝ | 0.874 |
| 6 | ๋ฌธํ_์ฌ๊ฐ | 0.825 |
| 7 | ๋์ถ์ฐ | 0.909 |
| 8 | ๋ณต์ง | 0.866 |
| 9 | ์ธ๋ฌด | 0.974 |
| 10 | ์ํ์๋ | 0.921 |
| 11 | ๊ฒฝ์ | 0.874 |
Test set (20,788๊ฑด)
- Accuracy: 0.871
- Macro F1: 0.873
- Weighted F1: 0.871
ํ์ต ๋ฐ์ดํฐ
- AI Hub 143๋ฒ "๋ฏผ์ ์ ๋ฌด ํจ์จ, ์๋ํ๋ฅผ ์ํ ์ธ์ด AI ํ์ต๋ฐ์ดํฐ" (~86๋ง ๊ฑด, 18 ์นดํ ๊ณ ๋ฆฌ โ 11 ๋งคํ)
- group_id ๋จ์ 8:1:1 ๋ถํ + ์นดํ ๊ณ ๋ฆฌ๋น train 20k cap
- ๋ง์คํน ํ ํฐ(
#@์ฃผ์#๋ฑ) โ special token([ADDR]๋ฑ) ์นํ
ํ์ต ์ค์
- Base:
klue/bert-base - max_length: 128
- batch_size: 32
- epochs: 3
- learning_rate: 2e-5
- warmup_ratio: 0.1
- weight_decay: 0.01
- ํ์ต ์๊ฐ: ~45๋ถ (RTX 4060 Ti)
์ฌ์ฉ ์์
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier")
model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier")
text = "์ง ์์ ์ฐจ๊ฐ ์๊พธ ๋ถ๋ฒ์ฃผ์ฐจํด์ ๋๋ฌด ๋ถํธํฉ๋๋ค."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
labels = ['๊ตํต','๊ฑด์ถ','ํ์ ','๋ณด๊ฑด์์','ํ๊ฒฝ','๋ฌธํ_์ฌ๊ฐ','๋์ถ์ฐ','๋ณต์ง','์ธ๋ฌด','์ํ์๋','๊ฒฝ์ ']
pred = labels[probs.argmax().item()]
print(pred, probs.max().item())
๋๋ ๋ณธ ํ๋ก์ ํธ์ chatbot_service.classify_complaint() ์ฌ์ฉ.
ํ๊ณ
- ํ์ต ๋ฐ์ดํฐ(AI Hub 143)๋ ์ฐฝ์์ ๋ฏผ์ ์ค์ฌ์ด๋ผ ์ง์ญ ์ดํ ํธํฅ ๊ฐ๋ฅ
- "๊ฑด์ถ" ์นดํ ๊ณ ๋ฆฌ F1 0.755๊ฐ ๊ฐ์ฅ ๋ฎ์ โ ์์ ๊ฑด์ค๊ณผ raw_category์ ๋๋ก/์์ค ๋ฏผ์์ด ์์ฌ์๋ ๋ผ๋ฒจ ๋ ธ์ด์ฆ ์ํฅ
- ๋์์ด์/์งง์ ํ ์คํธ(์: "์ ํธ๋ฑ")๋ confidence ๋ฎ์. top-3๋ก ๋ฐ์์ LLM์ด ํ๋จ ๊ถ์ฅ
- Downloads last month
- 42
Model tree for atti433/minde-classifier
Base model
klue/bert-base
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="atti433/minde-classifier")