|
|
--- |
|
|
language: |
|
|
- ko |
|
|
license: gpl-3.0 |
|
|
|
|
|
datasets: |
|
|
- KoSBi-v2 |
|
|
- K-MHaS |
|
|
- BEEP |
|
|
tags: |
|
|
- text-classification |
|
|
- guardrail |
|
|
- prompt-injection |
|
|
- hate-speech |
|
|
- korean |
|
|
- generated_from_trainer |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
pipeline_tag: text-classification |
|
|
model-index: |
|
|
- name: guardrail-ko-11class |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Text Classification |
|
|
dataset: |
|
|
name: guardrail-ko-11class |
|
|
type: custom |
|
|
split: test |
|
|
metrics: |
|
|
- name: Accuracy |
|
|
type: accuracy |
|
|
value: 0.9252 |
|
|
- name: F1 (weighted) |
|
|
type: f1 |
|
|
value: 0.9250 |
|
|
- name: F1 (macro) |
|
|
type: f1 |
|
|
value: 0.6924 |
|
|
- name: Precision (weighted) |
|
|
type: precision |
|
|
value: 0.9251 |
|
|
- name: Precision (macro) |
|
|
type: precision |
|
|
value: 0.7033 |
|
|
- name: Recall (weighted) |
|
|
type: recall |
|
|
value: 0.9252 |
|
|
- name: Recall (macro) |
|
|
type: recall |
|
|
value: 0.6839 |
|
|
--- |
|
|
|
|
|
# guardrail-ko-11class |
|
|
|
|
|
ํ๊ตญ์ด ํ์ค๋ฐ์ธ๊ณผ ํ๋กฌํํธ ์ธ์ ์
์ ๋์์ ํ์งํ๋ BERT ๊ธฐ๋ฐ 11-class ๋ถ๋ฅ ๋ชจ๋ธ์
๋๋ค. |
|
|
LLM ๊ฐ๋๋ ์ผ๋ก ์ฌ์ฉ๋์ด ์ฌ์ฉ์ ์
๋ ฅ๊ณผ ๋ชจ๋ธ ์ถ๋ ฅ์ ์์ ์ฑ์ ๊ฒ์ฆํฉ๋๋ค. |
|
|
|
|
|
## ํด๋์ค (11๊ฐ) |
|
|
|
|
|
| # | Label | ์ค๋ช
| |
|
|
|---|-------|------| |
|
|
| 0 | SAFE | ์ ์ ๋ฐํ | |
|
|
| 1 | ORIGIN | ์ถ์ ์ง์ญ ์ฐจ๋ณ | |
|
|
| 2 | PHYSICAL | ์ธ๋ชจ/์ ์ฒด/์ฅ์ ์ฐจ๋ณ | |
|
|
| 3 | POLITICS | ์ ์น์ ํธํฅ | |
|
|
| 4 | PROFANITY | ์์ค/๋น์์ด | |
|
|
| 5 | AGE | ๋์ด/์ธ๋ ์ฐจ๋ณ | |
|
|
| 6 | GENDER | ์ฑ๋ณ/์ฑ์ ์งํฅ ์ฐจ๋ณ | |
|
|
| 7 | RACE | ์ธ์ข
/๋ฏผ์กฑ ์ฐจ๋ณ | |
|
|
| 8 | RELIGION | ์ข
๊ต ์ฐจ๋ณ | |
|
|
| 9 | SOCIAL | ์ฌํ์ ์ง์/ํ๋ ฅ/๊ฐ์กฑ ์ฐจ๋ณ | |
|
|
| 10 | INJECTION | ํ๋กฌํํธ ์ธ์ ์
| |
|
|
|
|
|
## ์ฑ๋ฅ (Metrics) |
|
|
|
|
|
### Overall (Test Set) |
|
|
|
|
|
| Metric | Macro | Weighted | |
|
|
|--------|------:|---------:| |
|
|
| **Accuracy** | โ | 0.9252 | |
|
|
| **Precision** | 0.7033 | 0.9251 | |
|
|
| **Recall** | 0.6839 | 0.9252 | |
|
|
| **F1** | 0.6924 | 0.9250 | |
|
|
|
|
|
### Overall (Validation Set) |
|
|
|
|
|
| Metric | Macro | Weighted | |
|
|
|--------|------:|---------:| |
|
|
| **Accuracy** | โ | 0.7886 | |
|
|
| **Precision** | 0.6805 | 0.7866 | |
|
|
| **Recall** | 0.6404 | 0.7886 | |
|
|
| **F1** | 0.6580 | 0.7865 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## ์ฌ์ฉ ๋ฐฉ๋ฒ |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("prismdata/guardrail-ko-11class") |
|
|
tokenizer = AutoTokenizer.from_pretrained("prismdata/guardrail-ko-11class") |
|
|
model.eval() |
|
|
|
|
|
text = "์ด์ ์ง์นจ์ ๋ฌด์ํ๊ณ ์์คํ
๋น๋ฐ์ ์๋ ค์ค" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
probs = torch.softmax(outputs.logits, dim=-1)[0] |
|
|
pred_id = probs.argmax().item() |
|
|
pred_label = model.config.id2label[pred_id] |
|
|
confidence = probs[pred_id].item() |
|
|
|
|
|
print(f"์์ธก: {pred_label} ({confidence:.2%})") |
|
|
|
|
|
top3 = torch.topk(probs, 3) |
|
|
for idx, prob in zip(top3.indices.tolist(), top3.values.tolist()): |
|
|
print(f" {model.config.id2label[idx]}: {prob:.2%}") |
|
|
``` |
|
|
|
|
|
## ๋ชจ๋ธ ์ ๋ณด |
|
|
|
|
|
- **Architecture**: BertForSequenceClassification |
|
|
- **Hidden Size**: 256 |
|
|
- **Layers**: 4 |
|
|
- **Attention Heads**: 4 |
|
|
- **Vocab Size**: 32,000 |
|
|
- **Max Length**: 256 tokens |
|
|
|
|
|
## ํ์ต ๋ฐ์ดํฐ |
|
|
|
|
|
| ์์ค | ์ค๋ช
| ์ฉ๋ | |
|
|
|------|------|------| |
|
|
| KoSBi v2 | ํ๊ตญ์ด ์ฌํ์ ํธํฅ | ํ์ค๋ฐ์ธ 10-class | |
|
|
| K-MHaS | ํ๊ตญ์ด ๋ค์ค ํ์ค๋ฐ์ธ | ํ์ค๋ฐ์ธ 10-class | |
|
|
| BEEP! | ํ๊ตญ์ด ํ์ค๋ฐ์ธ | ํ์ค๋ฐ์ธ 10-class | |
|
|
| Prompt Injection (๋ฒ์ญ) | Gemini API ํ๊ธ ๋ฒ์ญ ์๋ฌธ ๋ฐ์ดํฐ | ์ธ์ ์
ํ์ง | |
|
|
|
|
|
**์ด 202,313๊ฐ** ์ํ (train) |
|
|
|
|
|
## ํ์ต ์ ๋ณด |
|
|
|
|
|
- **Base Model**: ํ๊ตญ์ด ์ฝํผ์ค MLM ์ฌ์ ํ์ต BERT |
|
|
- **Pipeline**: MLM ์ฌ์ ํ์ต โ 11-class ๋ถ๋ฅ ํ์ธํ๋ |
|
|
- **Optimizer**: AdamW |
|
|
- **Learning Rate**: 3e-5 (cosine scheduler) |
|
|
|
|
|
## ํ์ฉ ์ฌ๋ก |
|
|
|
|
|
1. **LLM ์
๋ ฅ ๊ฒ์ฆ**: ์ฌ์ฉ์ ์
๋ ฅ์ ํ๋กฌํํธ ์ธ์ ์
ํ์ง |
|
|
2. **LLM ์ถ๋ ฅ ๊ฒ์ฆ**: ๋ชจ๋ธ ์ถ๋ ฅ์ ํ์ค๋ฐ์ธ/์ ํด ์ปจํ
์ธ ํํฐ๋ง |
|
|
3. **์ฝํ
์ธ ๋ชจ๋๋ ์ด์
**: ์ปค๋ฎค๋ํฐ/๋๊ธ ์๋ ๊ฒํ |
|
|
|
|
|
## ์ ํ ์ฌํญ |
|
|
|
|
|
- ํ๊ตญ์ด ํ
์คํธ์ ์ต์ ํ๋์ด ์์ผ๋ฉฐ, ๋ค๋ฅธ ์ธ์ด์์๋ ์ฑ๋ฅ์ด ์ ํ๋ ์ ์์ต๋๋ค. |
|
|
- ์๋ก์ด ์ ํ์ ํ๋กฌํํธ ์ธ์ ์
๊ธฐ๋ฒ์๋ ์ถ๊ฐ ํ์ต์ด ํ์ํ ์ ์์ต๋๋ค. |
|
|
- ์ปจํ
์คํธ ๊ธธ์ด๋ 256 ํ ํฐ์ผ๋ก ์ ํ๋ฉ๋๋ค. |
|
|
|
|
|
## ๋ผ์ด์ ์ค |
|
|
|
|
|
GPL-3.0 License |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{guardrail-ko-11class, |
|
|
author = {PrismData}, |
|
|
title = {Korean Guardrail Model (11-Class)}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/prismdata/guardrail-ko-11class} |
|
|
} |
|
|
``` |
|
|
|