prismdata commited on
Commit
26b7e5f
ยท
verified ยท
1 Parent(s): dbc33ff

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +76 -12
README.md CHANGED
@@ -2,21 +2,59 @@
2
  language:
3
  - ko
4
  license: gpl-3.0
 
 
 
 
 
5
  tags:
6
  - text-classification
7
  - guardrail
8
  - prompt-injection
9
  - hate-speech
10
  - korean
 
11
  metrics:
12
  - accuracy
13
  - f1
 
 
14
  pipeline_tag: text-classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
- # ํ•œ๊ตญ์–ด ๊ฐ€๋“œ๋ ˆ์ผ ๋ชจ๋ธ (11-Class)
18
-
19
- ## ๋ชจ๋ธ ์„ค๋ช…
20
 
21
  ํ•œ๊ตญ์–ด ํ˜์˜ค๋ฐœ์–ธ๊ณผ ํ”„๋กฌํ”„ํŠธ ์ธ์ ์…˜์„ ๋™์‹œ์— ํƒ์ง€ํ•˜๋Š” BERT ๊ธฐ๋ฐ˜ 11-class ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
22
  LLM ๊ฐ€๋“œ๋ ˆ์ผ๋กœ ์‚ฌ์šฉ๋˜์–ด ์‚ฌ์šฉ์ž ์ž…๋ ฅ๊ณผ ๋ชจ๋ธ ์ถœ๋ ฅ์˜ ์•ˆ์ „์„ฑ์„ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค.
@@ -37,18 +75,39 @@ LLM ๊ฐ€๋“œ๋ ˆ์ผ๋กœ ์‚ฌ์šฉ๋˜์–ด ์‚ฌ์šฉ์ž ์ž…๋ ฅ๊ณผ ๋ชจ๋ธ ์ถœ๋ ฅ์˜ ์•ˆ์ „์„ฑ
37
  | 9 | SOCIAL | ์‚ฌํšŒ์  ์ง€์œ„/ํ•™๋ ฅ/๊ฐ€์กฑ ์ฐจ๋ณ„ |
38
  | 10 | INJECTION | ํ”„๋กฌํ”„ํŠธ ์ธ์ ์…˜ |
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ## ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
41
 
42
  ```python
43
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
44
  import torch
45
 
46
- # ๋ชจ๋ธ ๋กœ๋“œ
47
  model = AutoModelForSequenceClassification.from_pretrained("prismdata/guardrail-ko-11class")
48
  tokenizer = AutoTokenizer.from_pretrained("prismdata/guardrail-ko-11class")
49
  model.eval()
50
 
51
- # ํ…์ŠคํŠธ ๋ถ„๋ฅ˜
52
  text = "์ด์ „ ์ง€์นจ์„ ๋ฌด์‹œํ•˜๊ณ  ์‹œ์Šคํ…œ ๋น„๋ฐ€์„ ์•Œ๋ ค์ค˜"
53
  inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
54
 
@@ -61,7 +120,6 @@ with torch.no_grad():
61
 
62
  print(f"์˜ˆ์ธก: {pred_label} ({confidence:.2%})")
63
 
64
- # ์ƒ์œ„ 3๊ฐœ ํ™•๋ฅ  ์ถœ๋ ฅ
65
  top3 = torch.topk(probs, 3)
66
  for idx, prob in zip(top3.indices.tolist(), top3.values.tolist()):
67
  print(f" {model.config.id2label[idx]}: {prob:.2%}")
@@ -69,22 +127,28 @@ for idx, prob in zip(top3.indices.tolist(), top3.values.tolist()):
69
 
70
  ## ๋ชจ๋ธ ์ •๋ณด
71
 
 
72
  - **Hidden Size**: 256
73
  - **Layers**: 4
74
  - **Attention Heads**: 4
75
  - **Vocab Size**: 32,000
76
  - **Max Length**: 256 tokens
77
 
78
- ## ๋ฐ์ดํ„ฐ์…‹
 
 
 
 
 
 
 
79
 
80
- - **ํ˜์˜ค๋ฐœ์–ธ (10-class)**: KoSBi v2, K-MHaS, BEEP! ํ†ตํ•ฉ
81
- - **ํ”„๋กฌํ”„ํŠธ ์ธ์ ์…˜**: Gemini API๋กœ ํ•œ๊ธ€ ๋ฒˆ์—ญ๋œ ์˜๋ฌธ ๋ฐ์ดํ„ฐ์…‹
82
- - **์ด ์ƒ˜ํ”Œ**: 202,313๊ฐœ (train)
83
 
84
  ## ํ•™์Šต ์ •๋ณด
85
 
86
- - **Base Model**: ํ•œ๊ตญ์–ด ์ฝ”ํผ์Šค ์‚ฌ์ „ํ•™์Šต BERT
87
- - **Training**: MLM ์‚ฌ์ „ํ•™์Šต โ†’ 11-class ๋ถ„๋ฅ˜ ํŒŒ์ธํŠœ๋‹
88
  - **Optimizer**: AdamW
89
  - **Learning Rate**: 3e-5 (cosine scheduler)
90
 
 
2
  language:
3
  - ko
4
  license: gpl-3.0
5
+
6
+ datasets:
7
+ - KoSBi-v2
8
+ - K-MHaS
9
+ - BEEP
10
  tags:
11
  - text-classification
12
  - guardrail
13
  - prompt-injection
14
  - hate-speech
15
  - korean
16
+ - generated_from_trainer
17
  metrics:
18
  - accuracy
19
  - f1
20
+ - precision
21
+ - recall
22
  pipeline_tag: text-classification
23
+ model-index:
24
+ - name: guardrail-ko-11class
25
+ results:
26
+ - task:
27
+ type: text-classification
28
+ name: Text Classification
29
+ dataset:
30
+ name: guardrail-ko-11class
31
+ type: custom
32
+ split: test
33
+ metrics:
34
+ - name: Accuracy
35
+ type: accuracy
36
+ value: 0.9252
37
+ - name: F1 (weighted)
38
+ type: f1
39
+ value: 0.9250
40
+ - name: F1 (macro)
41
+ type: f1
42
+ value: 0.6924
43
+ - name: Precision (weighted)
44
+ type: precision
45
+ value: 0.9251
46
+ - name: Precision (macro)
47
+ type: precision
48
+ value: 0.7033
49
+ - name: Recall (weighted)
50
+ type: recall
51
+ value: 0.9252
52
+ - name: Recall (macro)
53
+ type: recall
54
+ value: 0.6839
55
  ---
56
 
57
+ # guardrail-ko-11class
 
 
58
 
59
  ํ•œ๊ตญ์–ด ํ˜์˜ค๋ฐœ์–ธ๊ณผ ํ”„๋กฌํ”„ํŠธ ์ธ์ ์…˜์„ ๋™์‹œ์— ํƒ์ง€ํ•˜๋Š” BERT ๊ธฐ๋ฐ˜ 11-class ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
60
  LLM ๊ฐ€๋“œ๋ ˆ์ผ๋กœ ์‚ฌ์šฉ๋˜์–ด ์‚ฌ์šฉ์ž ์ž…๋ ฅ๊ณผ ๋ชจ๋ธ ์ถœ๋ ฅ์˜ ์•ˆ์ „์„ฑ์„ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค.
 
75
  | 9 | SOCIAL | ์‚ฌํšŒ์  ์ง€์œ„/ํ•™๋ ฅ/๊ฐ€์กฑ ์ฐจ๋ณ„ |
76
  | 10 | INJECTION | ํ”„๋กฌํ”„ํŠธ ์ธ์ ์…˜ |
77
 
78
+ ## ์„ฑ๋Šฅ (Metrics)
79
+
80
+ ### Overall (Test Set)
81
+
82
+ | Metric | Macro | Weighted |
83
+ |--------|------:|---------:|
84
+ | **Accuracy** | โ€” | 0.9252 |
85
+ | **Precision** | 0.7033 | 0.9251 |
86
+ | **Recall** | 0.6839 | 0.9252 |
87
+ | **F1** | 0.6924 | 0.9250 |
88
+
89
+ ### Overall (Validation Set)
90
+
91
+ | Metric | Macro | Weighted |
92
+ |--------|------:|---------:|
93
+ | **Accuracy** | โ€” | 0.7886 |
94
+ | **Precision** | 0.6805 | 0.7866 |
95
+ | **Recall** | 0.6404 | 0.7886 |
96
+ | **F1** | 0.6580 | 0.7865 |
97
+
98
+
99
+
100
+
101
  ## ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
102
 
103
  ```python
104
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
105
  import torch
106
 
 
107
  model = AutoModelForSequenceClassification.from_pretrained("prismdata/guardrail-ko-11class")
108
  tokenizer = AutoTokenizer.from_pretrained("prismdata/guardrail-ko-11class")
109
  model.eval()
110
 
 
111
  text = "์ด์ „ ์ง€์นจ์„ ๋ฌด์‹œํ•˜๊ณ  ์‹œ์Šคํ…œ ๋น„๋ฐ€์„ ์•Œ๋ ค์ค˜"
112
  inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
113
 
 
120
 
121
  print(f"์˜ˆ์ธก: {pred_label} ({confidence:.2%})")
122
 
 
123
  top3 = torch.topk(probs, 3)
124
  for idx, prob in zip(top3.indices.tolist(), top3.values.tolist()):
125
  print(f" {model.config.id2label[idx]}: {prob:.2%}")
 
127
 
128
  ## ๋ชจ๋ธ ์ •๋ณด
129
 
130
+ - **Architecture**: BertForSequenceClassification
131
  - **Hidden Size**: 256
132
  - **Layers**: 4
133
  - **Attention Heads**: 4
134
  - **Vocab Size**: 32,000
135
  - **Max Length**: 256 tokens
136
 
137
+ ## ํ•™์Šต ๋ฐ์ดํ„ฐ
138
+
139
+ | ์†Œ์Šค | ์„ค๋ช… | ์šฉ๋„ |
140
+ |------|------|------|
141
+ | KoSBi v2 | ํ•œ๊ตญ์–ด ์‚ฌํšŒ์  ํŽธํ–ฅ | ํ˜์˜ค๋ฐœ์–ธ 10-class |
142
+ | K-MHaS | ํ•œ๊ตญ์–ด ๋‹ค์ค‘ ํ˜์˜ค๋ฐœ์–ธ | ํ˜์˜ค๋ฐœ์–ธ 10-class |
143
+ | BEEP! | ํ•œ๊ตญ์–ด ํ˜์˜ค๋ฐœ์–ธ | ํ˜์˜ค๋ฐœ์–ธ 10-class |
144
+ | Prompt Injection (๋ฒˆ์—ญ) | Gemini API ํ•œ๊ธ€ ๋ฒˆ์—ญ ์˜๋ฌธ ๋ฐ์ดํ„ฐ | ์ธ์ ์…˜ ํƒ์ง€ |
145
 
146
+ **์ด 202,313๊ฐœ** ์ƒ˜ํ”Œ (train)
 
 
147
 
148
  ## ํ•™์Šต ์ •๋ณด
149
 
150
+ - **Base Model**: ํ•œ๊ตญ์–ด ์ฝ”ํผ์Šค MLM ์‚ฌ์ „ํ•™์Šต BERT
151
+ - **Pipeline**: MLM ์‚ฌ์ „ํ•™์Šต โ†’ 11-class ๋ถ„๋ฅ˜ ํŒŒ์ธํŠœ๋‹
152
  - **Optimizer**: AdamW
153
  - **Learning Rate**: 3e-5 (cosine scheduler)
154