Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,82 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: gemma
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gemma
|
| 3 |
+
language:
|
| 4 |
+
- ko
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
---
|
| 7 |
+
<p align="left">
|
| 8 |
+
<img src="<>" width="50%"/>
|
| 9 |
+
</p>
|
| 10 |
+
|
| 11 |
+
# Devocean-06/Spam_Filter-gemma
|
| 12 |
+
> Update @ 2025.10.19: First release of Spam filter XAI
|
| 13 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
+
|
| 15 |
+
**Resources and Technical Documentation**:
|
| 16 |
+
* [Gemma3 Model](https://huggingface.co/google/gemma-3-4b-it)
|
| 17 |
+
|
| 18 |
+
**Citation**
|
| 19 |
+
|
| 20 |
+
```bibtex
|
| 21 |
+
@misc {Devocean-06/Spam_Filter-gemma,
|
| 22 |
+
author = { {SK Devoceon-06 On device LLM} },
|
| 23 |
+
title = { Spam filter & XAI },
|
| 24 |
+
year = 2025,
|
| 25 |
+
url = { https://huggingface.co/Devocean-06/Spam_Filter-gemma },
|
| 26 |
+
publisher = { Hugging Face }
|
| 27 |
+
}
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
**Model Developers**: SK Devoceon-06 On device LLM
|
| 31 |
+
|
| 32 |
+
# Model Information
|
| 33 |
+
**Skitty**๋ ๋ค์ํ ํํ์ ์คํธ ๋ฌธ์๋ฅผ ํํฐ๋งํ๊ณ ,
|
| 34 |
+
โ์ ์คํธ์ผ๋ก ๋ถ๋ฅ๋์๋๊ฐโ๋ฅผ **๋
ผ๋ฆฌ์ ์ผ๋ก ์ค๋ช
ํ ์ ์๋ sLLM** (Small Language Model)์
๋๋ค.
|
| 35 |
+
๋ชจ๋ธ์ ๋จ์ ๋ถ๋ฅ๋ฅผ ๋์ด, ํ๋จ ๊ทผ๊ฑฐ(reason)๋ฅผ ๋ช
์์ ์ผ๋ก ์ถ๋ ฅํ๋๋ก ์ค๊ณ๋์์ต๋๋ค.
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## ๐ง Description
|
| 40 |
+
์ด ๋ชจ๋ธ์ **์ค๋งํธ ์น์ ๋น
๋ฐ์ดํฐ ํ๋ซํผ(2025)**๏ฟฝ๏ฟฝ ํตํด ํ๋ณดํ
|
| 41 |
+
์ต์ ์คํธ ๋ฌธ์ ๋ฐ์ดํฐ์
์ ๊ธฐ๋ฐ์ผ๋ก ํ์ต๋์์ต๋๋ค.
|
| 42 |
+
|
| 43 |
+
### ๐ ๋ฐ์ดํฐ ๋ฐ ์ ์ฒ๋ฆฌ
|
| 44 |
+
|
| 45 |
+
- **๋ฐ์ดํฐ ์ถ์ฒ**: 2025๋
๋ ์ค๋งํธ ์น์ ๋น
๋ฐ์ดํฐ ํ๋ซํผ ์คํธ ๋ฌธ์ ๋ฐ์ดํฐ์
|
| 46 |
+
- **์ค๋ณต ์ ๊ฑฐ**: `SimHash` ๊ธฐ๋ฐ์ ๊ทผ์ฌ ์ค๋ณต ํํฐ๋ง์ ์ํํ์ฌ ์ ์ฌํ ๋ฉ์์ง ์ ๊ฑฐ
|
| 47 |
+
- **์ํ๋ง ์ ๋ต**: `Curriculum Sampling`์ ์ ์ฉํ์ฌ ์ฌ์ด ์์ โ ์ด๋ ค์ด ์์ ์์ผ๋ก ํ์ต
|
| 48 |
+
- **๋ผ๋ฒจ๋ง ๋ฐฉ์**: ๋ผ๋ฒจ ์ ๋ขฐ๋ ๋ณด์ ์ ๊ฑฐ์น **Hard Label ๊ธฐ๋ฐ ์ง๋ํ์ต**
|
| 49 |
+
|
| 50 |
+
### ๐ง ํ์ต ๋ฐ ์ฆ๋ฅ (Distillation)
|
| 51 |
+
|
| 52 |
+
- **Off-Policy Distillation**์ ์ ์ฉํ์ฌ Teacher LLM์ ์์ฌ๊ฒฐ์ ๊ทผ๊ฑฐ๋ฅผ ํจ์จ์ ์ผ๋ก ์์ถ
|
| 53 |
+
- ๋จ์ ์์ฑ ๋ชจ๋ฐฉ์ด ์๋, Teacher์ **๋
ผ๋ฆฌ์ ๋ถ๋ฅ ๊ทผ๊ฑฐ(reasoning trace)**๋ฅผ distill
|
| 54 |
+
- ๋์ด๋๋ณ Curriculum + Hard Label Distillation์ ๊ฒฐํฉํ์ฌ
|
| 55 |
+
**์ ํ๋โํด์๋ ฅโ์ผ๋ฐํ ์ฑ๋ฅ**์ ๊ท ํ์ ๋ฌ์ฑ
|
| 56 |
+
|
| 57 |
+
### ๐งพ ํต์ฌ ํน์ง
|
| 58 |
+
|
| 59 |
+
| ํญ๋ชฉ | ์ค๋ช
|
|
| 60 |
+
|------|------|
|
| 61 |
+
| **๋ชจ๋ธ ์ ํ** | sLLM (Small Language Model for Spam Classification & Explanation) |
|
| 62 |
+
| **ํต์ฌ ๊ธฐ๋ฅ** | ์คํธ/๋น์คํธ ๋ถ๋ฅ + ๊ทผ๊ฑฐ(reason) ์์ฑ |
|
| 63 |
+
| **ํ์ต ๋ฐฉ์** | Off-policy knowledge distillation + Curriculum sampling |
|
| 64 |
+
| **๋ฐ์ดํฐ ์ ์ ** | SimHash ๊ธฐ๋ฐ ์ค๋ณต ์ ๊ฑฐ ๋ฐ ํ์ง ํํฐ๋ง |
|
| 65 |
+
| **๋ชฉํ** | ๋จ์ ๋ถ๋ฅ๋ฅผ ๋์ด โ์ ์คํธ์ธ์งโ๋ฅผ ์ค๋ช
ํ ์ ์๋ ๋ชจ๋ธ |
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
## ๐ Quick Start
|
| 69 |
+
|
| 70 |
+
```python
|
| 71 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
|
| 72 |
+
|
| 73 |
+
MODEL_ID = "Devocean-06/Spam_Filter-gemma"
|
| 74 |
+
|
| 75 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
|
| 76 |
+
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
|
| 77 |
+
|
| 78 |
+
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
|
| 79 |
+
|
| 80 |
+
text = "๋ฌด๋ฃ ์ฟ ํฐ ์ง๊ธ! ์ง๊ธ ๋ฐ๋ก ํด๋ฆญํ์ธ์ ๐ https://spam.link ํด๋น ๋ฌธ์ ์คํธ์ธ๊ฐ์?"
|
| 81 |
+
result = pipe(text, top_k=2)
|
| 82 |
+
print(result)
|