Devocean-06
/

Spam_Filter-gemma

@@ -5,7 +5,7 @@ language:
 pipeline_tag: text-generation
 ---
 <p align="left">
-  <img src="https://huggingface.co/Devocean-06/Spam_Filter-gemma/blob/main/skitty.png" width="50%"/>
 </p>
 # Devocean-06/Spam_Filter-gemma
@@ -30,37 +30,24 @@ pipeline_tag: text-generation
 **Model Developers**: SK Devoceon-06 On device LLM
 ## Model Information
-Skitty is an explainable small language model (sLLM) designed to classify various types of spam messages and provide concise reasoning for its decisions.
-Instead of only labeling text as "spam" or "not spam", the model outputs short natural-language explanations describing why the message was identified as spam.
 ---
 ## 🧠 Description
 Skitty was trained on an updated 2025 spam message dataset collected through the Smart Police Big Data Platform in South Korea.
 The model leverages deduplication, curriculum sampling, and off-policy distillation to improve both classification accuracy and interpretability.
-### Data and Preprocessing
 - Data source: 2025 Smart Police Big Data Platform spam message dataset
 - Deduplication: Performed near-duplicate removal using SimHash filtering
 - Sampling strategy: Applied curriculum-based sampling to control difficulty and improve generalization
 - Labeling: Trained using hard-label supervision after label confidence refinement
-### Training and Distillation
 - Utilized off-policy distillation to compress the decision process of a large teacher LLM into a smaller student model
 - Instead of directly mimicking the teacher’s text generation, the model distills the reasoning trace for spam detection
 - Combined curriculum learning with hard-label distillation to balance accuracy, interpretability, and generalization
-### Key Features
-| Category | Description |
-|-----------|-------------|
-| Model Type | sLLM (Small Language Model for Spam Classification & Explanation) |
-| Main Function | Spam / Non-spam classification with reasoning |
-| Training Approach | Off-policy knowledge distillation + curriculum sampling |
-| Data Cleaning | SimHash-based deduplication and quality filtering |
-| Objective | Build a model that not only classifies spam but also explains its rationale |
 ---
 ## 🚀 Quick Start

 pipeline_tag: text-generation
 ---
 <p align="left">
+  <img src="https://huggingface.co/Devocean-06/Spam_Filter-gemma/resolve/main/skitty.png" width="50%"/>
 </p>
 # Devocean-06/Spam_Filter-gemma
 **Model Developers**: SK Devoceon-06 On device LLM
 ## Model Information
+Skitty is an explainable small language model (sLLM) that classifies spam messages and provides brief reasoning for each decision.
 ---
 ## 🧠 Description
 Skitty was trained on an updated 2025 spam message dataset collected through the Smart Police Big Data Platform in South Korea.
 The model leverages deduplication, curriculum sampling, and off-policy distillation to improve both classification accuracy and interpretability.
+## Data and Preprocessing
 - Data source: 2025 Smart Police Big Data Platform spam message dataset
 - Deduplication: Performed near-duplicate removal using SimHash filtering
 - Sampling strategy: Applied curriculum-based sampling to control difficulty and improve generalization
 - Labeling: Trained using hard-label supervision after label confidence refinement
+## Training and Distillation
 - Utilized off-policy distillation to compress the decision process of a large teacher LLM into a smaller student model
 - Instead of directly mimicking the teacher’s text generation, the model distills the reasoning trace for spam detection
 - Combined curriculum learning with hard-label distillation to balance accuracy, interpretability, and generalization
 ---
 ## 🚀 Quick Start