atti433 commited on
Commit
2b2fbde
ยท
verified ยท
1 Parent(s): 6ca8609

Add model card

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ko
4
+ license: other
5
+ library_name: transformers
6
+ pipeline_tag: text-classification
7
+ base_model: klue/bert-base
8
+ tags:
9
+ - bert
10
+ - klue
11
+ - korean
12
+ - text-classification
13
+ - minwon
14
+ - complaint
15
+ - public-administration
16
+ ---
17
+
18
+ # MindE ๋ฏผ์› ๋ถ„๋ฅ˜๊ธฐ (bert-v9)
19
+
20
+ ํ•œ๊ตญ ๊ณต๊ณต ๋ฏผ์›์„ **11๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ**๋กœ ์ž๋™ ๋ถ„๋ฅ˜ํ•˜๋Š” KLUE BERT ๊ธฐ๋ฐ˜ ๋ชจ๋ธ.
21
+
22
+ ## ์นดํ…Œ๊ณ ๋ฆฌ (11)
23
+
24
+ | ID | ์นดํ…Œ๊ณ ๋ฆฌ | per-class F1 |
25
+ |---:|---|---:|
26
+ | 1 | ๊ตํ†ต | 0.882 |
27
+ | 2 | ๊ฑด์ถ• | 0.755 |
28
+ | 3 | ํ–‰์ • | 0.812 |
29
+ | 4 | ๋ณด๊ฑด์œ„์ƒ | 0.911 |
30
+ | 5 | ํ™˜๊ฒฝ | 0.874 |
31
+ | 6 | ๋ฌธํ™”_์—ฌ๊ฐ€ | 0.825 |
32
+ | 7 | ๋†์ถ•์‚ฐ | 0.909 |
33
+ | 8 | ๋ณต์ง€ | 0.866 |
34
+ | 9 | ์„ธ๋ฌด | 0.974 |
35
+ | 10 | ์ƒํ•˜์ˆ˜๋„ | 0.921 |
36
+ | 11 | ๊ฒฝ์ œ | 0.874 |
37
+
38
+ **Test set (20,788๊ฑด)**
39
+ - Accuracy: **0.871**
40
+ - Macro F1: **0.873**
41
+ - Weighted F1: 0.871
42
+
43
+ ## ํ•™์Šต ๋ฐ์ดํ„ฐ
44
+
45
+ - AI Hub 143๋ฒˆ "๋ฏผ์› ์—…๋ฌด ํšจ์œจ, ์ž๋™ํ™”๋ฅผ ์œ„ํ•œ ์–ธ์–ด AI ํ•™์Šต๋ฐ์ดํ„ฐ" (~86๋งŒ ๊ฑด, 18 ์นดํ…Œ๊ณ ๋ฆฌ โ†’ 11 ๋งคํ•‘)
46
+ - group_id ๋‹จ์œ„ 8:1:1 ๋ถ„ํ•  + ์นดํ…Œ๊ณ ๋ฆฌ๋‹น train 20k cap
47
+ - ๋งˆ์Šคํ‚น ํ† ํฐ(`#@์ฃผ์†Œ#` ๋“ฑ) โ†’ special token(`[ADDR]` ๋“ฑ) ์น˜ํ™˜
48
+
49
+ ## ํ•™์Šต ์„ค์ •
50
+
51
+ - Base: `klue/bert-base`
52
+ - max_length: 128
53
+ - batch_size: 32
54
+ - epochs: 3
55
+ - learning_rate: 2e-5
56
+ - warmup_ratio: 0.1
57
+ - weight_decay: 0.01
58
+ - ํ•™์Šต ์‹œ๊ฐ„: ~45๋ถ„ (RTX 4060 Ti)
59
+
60
+ ## ์‚ฌ์šฉ ์˜ˆ์‹œ
61
+
62
+ ```python
63
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
64
+ import torch
65
+
66
+ tokenizer = AutoTokenizer.from_pretrained("atti433/minde-classifier")
67
+ model = AutoModelForSequenceClassification.from_pretrained("atti433/minde-classifier")
68
+
69
+ text = "์ง‘ ์•ž์— ์ฐจ๊ฐ€ ์ž๊พธ ๋ถˆ๋ฒ•์ฃผ์ฐจํ•ด์„œ ๋„ˆ๋ฌด ๋ถˆํŽธํ•ฉ๋‹ˆ๋‹ค."
70
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
71
+ with torch.no_grad():
72
+ logits = model(**inputs).logits
73
+ probs = torch.softmax(logits, dim=-1)
74
+ labels = ['๊ตํ†ต','๊ฑด์ถ•','ํ–‰์ •','๋ณด๊ฑด์œ„์ƒ','ํ™˜๊ฒฝ','๋ฌธํ™”_์—ฌ๊ฐ€','๋†์ถ•์‚ฐ','๋ณต์ง€','์„ธ๋ฌด','์ƒํ•˜์ˆ˜๋„','๊ฒฝ์ œ']
75
+ pred = labels[probs.argmax().item()]
76
+ print(pred, probs.max().item())
77
+ ```
78
+
79
+ ๋˜๋Š” ๋ณธ ํ”„๋กœ์ ํŠธ์˜ `chatbot_service.classify_complaint()` ์‚ฌ์šฉ.
80
+
81
+ ## ํ•œ๊ณ„
82
+
83
+ - ํ•™์Šต ๋ฐ์ดํ„ฐ(AI Hub 143)๋Š” ์ฐฝ์›์‹œ ๋ฏผ์› ์ค‘์‹ฌ์ด๋ผ ์ง€์—ญ ์–ดํœ˜ ํŽธํ–ฅ ๊ฐ€๋Šฅ
84
+ - "๊ฑด์ถ•" ์นดํ…Œ๊ณ ๋ฆฌ F1 0.755๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์Œ โ€” ์•ˆ์ „๊ฑด์„ค๊ณผ raw_category์— ๋„๋กœ/์‹œ์„ค ๋ฏผ์›์ด ์„ž์—ฌ์žˆ๋˜ ๋ผ๋ฒจ ๋…ธ์ด์ฆˆ ์˜ํ–ฅ
85
+ - ๋™์Œ์ด์˜/์งง์€ ํ…์ŠคํŠธ(์˜ˆ: "์‹ ํ˜ธ๋“ฑ")๋Š” confidence ๋‚ฎ์Œ. top-3๋กœ ๋ฐ›์•„์„œ LLM์ด ํŒ๋‹จ ๊ถŒ์žฅ