File size: 8,241 Bytes

65b7bd8
 
 
 
 
286eeb4
 
 
 
 
 
 
65b7bd8
286eeb4
65b7bd8
d6d47f7
65b7bd8
 
 
286eeb4
65b7bd8
286eeb4
65b7bd8
 
 
 
286eeb4
65b7bd8
 
 
b6332e2
286eeb4
5ce551b
286eeb4
65b7bd8
 
6d88bb9
286eeb4
5ce551b
8af3832
b6332e2
d6d47f7
286eeb4
 
 
 
 
 
 
b6332e2
d6d47f7
286eeb4
b6332e2
286eeb4
b6332e2
 
65b7bd8
b6332e2
286eeb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbcd3f7
d6a8617
286eeb4
65b7bd8
cbcd3f7
 
 
 
65b7bd8
d83e639
 
 
 
cbcd3f7
d83e639
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbcd3f7
 
 
d83e639
 
 
d6a8617
 
286eeb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d956e9
 
286eeb4
 
 
 
 
 
8d956e9
 
d6a8617
286eeb4
d6a8617
286eeb4
d6a8617
286eeb4

---
license: gemma
language:
- ko
pipeline_tag: text-generation
tags:
- spam-detection
- explainable-ai
- on-device
- korean
datasets:
- Devocean-06/Spam_QA-Corpus
---

<p align="left">
  <img src="https://huggingface.co/Devocean-06/Spam_Filter-gemma/resolve/main/skitty.png" width="50%"/>
</p>

# Devocean-06/Spam_Filter-gemma

> Update @ 2025.10.19: First release of Spam filter XAI

<!-- Provide a quick summary of what the model is/does. -->

**Resources and Technical Documentation**:
* [Gemma3 Model](https://huggingface.co/google/gemma-3-4b-it)
* [Training Dataset](https://huggingface.co/datasets/Devocean-06/Spam_QA-Corpus)

**Model Developers**: SK Devoceon-06 On device LLM

## Model Information

- Skitty is an explainable small language model (sLLM) that classifies spam messages and provides brief reasoning for each decision.

---

## Description

- Skitty was trained on an updated 2025 spam message dataset collected through the Smart Police Big Data Platform in South Korea.  
- The model leverages deduplication, curriculum sampling, and off-policy distillation to improve both classification accuracy and interpretability.

## Data and Preprocessing

- **Data source**: 2025 Smart Police Big Data Platform spam message dataset  
- **Dataset**: [Devocean-06/Spam_QA-Corpus](https://huggingface.co/datasets/Devocean-06/Spam_QA-Corpus)
- **Format**: Alpaca instruction format (instruction, input, output)
- **Deduplication**: Performed near-duplicate removal using SimHash filtering  
- **Sampling strategy**: Applied curriculum-based sampling to control difficulty and improve generalization  
- **Labeling**: Trained using hard-label supervision after label confidence refinement

## Training and Distillation

- Utilized off-policy distillation to compress the decision process of a large teacher LLM into a smaller student model  
- Instead of directly mimicking the teacher's text generation, the model distills the reasoning trace for spam detection  
- Combined curriculum learning with hard-label distillation to balance accuracy, interpretability, and generalization

---

## Training Configuration

### Base Model
- **Base Model**: [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
- **Training Framework**: [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
- **Fine-tuning Method**: QLoRA (Quantized Low-Rank Adaptation)

### Hyperparameters

| Parameter | Value | Description |
|-----------|-------|-------------|
| **Quantization** | 4-bit | Load pretrained model in 4-bit |
| **Adapter** | QLoRA | Low-rank adaptation method |
| **LoRA Rank (r)** | 16 | Rank of low-rank matrices |
| **LoRA Alpha** | 32 | Scaling factor for LoRA |
| **LoRA Dropout** | 0.05 | Dropout rate for LoRA layers |
| **Target Modules** | attention + MLP | Applied to q,k,v,o,up,down,gate projections |
| **Sequence Length** | 1500 | Maximum input sequence length |
| **Sample Packing** | True | Pack multiple samples into one sequence |
| **Micro Batch Size** | 10 | Batch size per GPU |
| **Gradient Accumulation** | 15 | Effective batch size: 150 |
| **Number of Epochs** | 5 | Total training epochs |
| **Learning Rate** | 2e-5 | Peak learning rate |
| **LR Scheduler** | Cosine | Cosine annealing schedule |
| **Warmup Steps** | 10 | Learning rate warmup steps |
| **Optimizer** | AdamW (8-bit) | 8-bit quantized AdamW |
| **Weight Decay** | 0.0 | L2 regularization |
| **Precision** | BF16 | Brain floating point 16 |
| **Gradient Checkpointing** | True | Save memory by recomputing gradients |
| **Flash Attention** | True | Optimized attention kernel |

### Training Monitoring
- **Logging Steps**: 100
- **Evaluation Steps**: 50
- **Save Steps**: 50
- **Evaluation Strategy**: Steps-based
- **Tracking**: Weights & Biases (wandb)

---

## Running with the `vllm` API

You can initialize the model and processor for inference with `pipeline` as follows.

```sh
vllm serve Devocean-06/Spam_Filter-gemma
```

```python
from openai import OpenAI

client = OpenAI(
    base_url="model-endpoint",
    api_key="api-key"  
)

SYSTEM_PROMPT = """당신은 스팸 문자로 판정한 근거를 생성하는 대형 언어 모델입니다.
아래 기준에 따라 스팸여부 판정의 근거를 간단명료하게 한 문장으로 작성해 주세요. 출력 포맷은 XAI 설명에 적합하도록 일관성 있게 템플릿 형식으로 고정되어야 하며, 스팸 여부 및 그 근거를 명쾌하게 제시해야 합니다.

**1. 판정 근거(한 문장, 템플릿):**
- **개인 정보 요구:** 신분증, 비밀번호, 카드 번호 등 개인 정보를 요구했기 때문입니다.
- **기타 특이사항:** 위 항목 외에 스팸으로 의심되는 다른 패턴이 있습니다.
- **발신자/수신자:** 발신 번호가 일반적이지 않거나 불분명하기 때문입니다.
- **내용의 목적:** 금융 상품, 대출, 도박, 투자, 불법 복제 등의 홍보나 권유가 포함되어 있기 때문입니다.
- **심리적 압박:** 긴급성, 공포, 호기심을 유발하여 즉각적인 행동을 유도했기 때문입니다. (예: "기간 한정", "지금 즉시", "클릭하지 않으면 불이익")
- **링크/URL:** 일반적이지 않은 짧은 URL, 단축 URL 또는 의심스러운 링크가 포함되어 있기 때문입니다.

**2. 필수 조건**
- 반드시 출력 형식에 따라서 [스팸 판정 이유] 템플릿을 사용해야 합니다.
- 스팸으로 판정한 이유에 대해서 구체적인 이유로 100자 이상으로 설명해야 합니다.
- 반드시 위 판정 근거를 먼저 언급한 뒤에 출력 형식에 맞게 스팸 판정 이유를 생성해야 합니다.
- 스팸 판정 이유 생성 시, 위 스팸 문자는 ~~ 으로 시작해야합니다.
- 그리고 전제조건은 모두 스팸 문자로 분류된 형식이니 스팸이 아니라고 언급하면 안됩니다.

### 출력 형식 예시
- 판정 근거 : 개인정보 요구
- 스팸 판정 이유: 위 스팸 문자는 개인정보를 요구하는 스팸으로 아파트 분양 및 부동산 투자 권유가 포함되어 있으며, 긴급성을 강조하여 즉각적인 행동을 유도하고 있습니다."""

response = client.chat.completions.create(
        model="Devocean-06/Spam_Filter-gemma",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message}
        ],
        temperature=0.7,
        max_tokens=2048
    )
print(response.choices[0].message.content)

```
## 🧠 Example Output
```sh
- 판정 근거: 내용의 목적  
- 스팸 판정 이유: 위 스팸 문자는 금융 상품과 대출 관련 권유 내용을 포함하고 있으며,
  ‘지금 바로’, ‘즉시 신청’과 같은 심리적 압박 어구를 사용하여 수신자의 행동을 유도하고 있습니다.
```

---

## Software

Training was conducted using the **Axolotl framework**, a flexible and efficient fine-tuning system designed for large language models.

Axolotl enables seamless configuration and execution of full fine-tuning, LoRA, and DPO pipelines through simple YAML-based workflows. It integrates with PyTorch and Hugging Face Transformers, supporting distributed strategies such as FSDP and DeepSpeed for optimized performance on multi-GPU environments.

This framework streamlines experimentation and scaling by allowing researchers to define training parameters, datasets, and model behaviors declaratively — reducing boilerplate and ensuring reproducible results across setups.

**Key Features Used:**
- QLoRA for parameter-efficient fine-tuning
- 4-bit quantization during training
- Flash Attention for faster training
- Gradient checkpointing for memory efficiency
- Alpaca dataset format support

---

## Citation

```bibtex
@misc{Devocean-06/Spam_Filter-gemma,
  author       = { {SK Devoceon-06 On device LLM} },
  title        = { Spam filter & XAI },
  year         = 2025,
  url          = { https://huggingface.co/Devocean-06/Spam_Filter-gemma },
  publisher    = { Hugging Face }
}
```

---

## License

This model is released under the Gemma license. Please refer to the original [Gemma license](https://ai.google.dev/gemma/terms) for usage terms and conditions.