Text Classification
Transformers
Safetensors
Korean
bert
klue
korean
minwon
complaint
public-administration
text-embeddings-inference
Instructions to use atti433/minde-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use atti433/minde-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="atti433/minde-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier") model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - ko | |
| license: other | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| base_model: klue/bert-base | |
| tags: | |
| - bert | |
| - klue | |
| - korean | |
| - text-classification | |
| - minwon | |
| - complaint | |
| - public-administration | |
| # MindE ๋ฏผ์ ๋ถ๋ฅ๊ธฐ (bert-v9) | |
| ํ๊ตญ ๊ณต๊ณต ๋ฏผ์์ **11๊ฐ ์นดํ ๊ณ ๋ฆฌ**๋ก ์๋ ๋ถ๋ฅํ๋ KLUE BERT ๊ธฐ๋ฐ ๋ชจ๋ธ. | |
| ## ์นดํ ๊ณ ๋ฆฌ (11) | |
| | ID | ์นดํ ๊ณ ๋ฆฌ | per-class F1 | | |
| |---:|---|---:| | |
| | 1 | ๊ตํต | 0.882 | | |
| | 2 | ๊ฑด์ถ | 0.755 | | |
| | 3 | ํ์ | 0.812 | | |
| | 4 | ๋ณด๊ฑด์์ | 0.911 | | |
| | 5 | ํ๊ฒฝ | 0.874 | | |
| | 6 | ๋ฌธํ_์ฌ๊ฐ | 0.825 | | |
| | 7 | ๋์ถ์ฐ | 0.909 | | |
| | 8 | ๋ณต์ง | 0.866 | | |
| | 9 | ์ธ๋ฌด | 0.974 | | |
| | 10 | ์ํ์๋ | 0.921 | | |
| | 11 | ๊ฒฝ์ | 0.874 | | |
| **Test set (20,788๊ฑด)** | |
| - Accuracy: **0.871** | |
| - Macro F1: **0.873** | |
| - Weighted F1: 0.871 | |
| ## ํ์ต ๋ฐ์ดํฐ | |
| - AI Hub 143๋ฒ "๋ฏผ์ ์ ๋ฌด ํจ์จ, ์๋ํ๋ฅผ ์ํ ์ธ์ด AI ํ์ต๋ฐ์ดํฐ" (~86๋ง ๊ฑด, 18 ์นดํ ๊ณ ๋ฆฌ โ 11 ๋งคํ) | |
| - group_id ๋จ์ 8:1:1 ๋ถํ + ์นดํ ๊ณ ๋ฆฌ๋น train 20k cap | |
| - ๋ง์คํน ํ ํฐ(`#@์ฃผ์#` ๋ฑ) โ special token(`[ADDR]` ๋ฑ) ์นํ | |
| ## ํ์ต ์ค์ | |
| - Base: `klue/bert-base` | |
| - max_length: 128 | |
| - batch_size: 32 | |
| - epochs: 3 | |
| - learning_rate: 2e-5 | |
| - warmup_ratio: 0.1 | |
| - weight_decay: 0.01 | |
| - ํ์ต ์๊ฐ: ~45๋ถ (RTX 4060 Ti) | |
| ## ์ฌ์ฉ ์์ | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier") | |
| model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier") | |
| text = "์ง ์์ ์ฐจ๊ฐ ์๊พธ ๋ถ๋ฒ์ฃผ์ฐจํด์ ๋๋ฌด ๋ถํธํฉ๋๋ค." | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) | |
| with torch.no_grad(): | |
| logits = model(**inputs).logits | |
| probs = torch.softmax(logits, dim=-1) | |
| labels = ['๊ตํต','๊ฑด์ถ','ํ์ ','๋ณด๊ฑด์์','ํ๊ฒฝ','๋ฌธํ_์ฌ๊ฐ','๋์ถ์ฐ','๋ณต์ง','์ธ๋ฌด','์ํ์๋','๊ฒฝ์ '] | |
| pred = labels[probs.argmax().item()] | |
| print(pred, probs.max().item()) | |
| ``` | |
| ๋๋ ๋ณธ ํ๋ก์ ํธ์ `chatbot_service.classify_complaint()` ์ฌ์ฉ. | |
| ## ํ๊ณ | |
| - ํ์ต ๋ฐ์ดํฐ(AI Hub 143)๋ ์ฐฝ์์ ๋ฏผ์ ์ค์ฌ์ด๋ผ ์ง์ญ ์ดํ ํธํฅ ๊ฐ๋ฅ | |
| - "๊ฑด์ถ" ์นดํ ๊ณ ๋ฆฌ F1 0.755๊ฐ ๊ฐ์ฅ ๋ฎ์ โ ์์ ๊ฑด์ค๊ณผ raw_category์ ๋๋ก/์์ค ๋ฏผ์์ด ์์ฌ์๋ ๋ผ๋ฒจ ๋ ธ์ด์ฆ ์ํฅ | |
| - ๋์์ด์/์งง์ ํ ์คํธ(์: "์ ํธ๋ฑ")๋ confidence ๋ฎ์. top-3๋ก ๋ฐ์์ LLM์ด ํ๋จ ๊ถ์ฅ | |