Upload sugiv-pii-classifier LoRA adapter

Browse files

Files changed (3) hide show

README.md +229 -0
adapter_config.json +39 -0
adapter_model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,229 @@

+---
+license: apache-2.0
+base_model: meta-llama/Llama-3.2-3B-Instruct
+tags:
+  - pii-detection
+  - text-classification
+  - llama
+  - lora
+  - peft
+  - moderation
+  - safety
+  - privacy
+  - distillation
+datasets:
+  - synthetic
+language:
+  - en
+pipeline_tag: text-classification
+library_name: peft
+---
+# sugiv-pii-classifier
+A lightweight **3B parameter** PII (Personally Identifiable Information) classifier, distilled from the [Roblox PII Classifier](https://huggingface.co/Roblox/roblox-pii-classifier) (560M XLM-RoBERTa) using **teacher-student distillation** with Fireworks AI's LoRA fine-tuning.
+## 🎯 Model Overview
+| Attribute | Value |
+|-----------|-------|
+| **Base Model** | [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) |
+| **Teacher Model** | [Roblox/roblox-pii-classifier](https://huggingface.co/Roblox/roblox-pii-classifier) |
+| **Training Method** | LoRA (Low-Rank Adaptation) via Fireworks AI SFT |
+| **LoRA Rank** | 16 |
+| **Training Examples** | 4,000 (5,000 generated, 80/10/10 split) |
+| **Test Accuracy** | **87.4%** on held-out test set |
+| **Labels** | `none`, `asking`, `giving` |
+## 🏗️ Architecture
+This model uses a novel **"LLM Head as Classifier"** approach inspired by [Fireworks AI's blog post](https://fireworks.ai/blog/Finetuning-LLMs-as-Classifiers):
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    TEACHER-STUDENT DISTILLATION                  │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌──────────────────┐     Labels      ┌──────────────────────┐  │
+│  │   DeepSeek API   │ ──────────────► │  Synthetic Dataset   │  │
+│  │ (Data Generator) │                 │    5,000 examples    │  │
+│  └──────────────────┘                 └──────────┬───────────┘  │
+│                                                   │              │
+│                                                   ▼              │
+│  ┌──────────────────────────────────────────────────────────┐   │
+│  │              ROBLOX PII CLASSIFIER (Teacher)              │   │
+│  │                  XLM-RoBERTa-Large (560M)                 │   │
+│  │                                                           │   │
+│  │   Thresholds:                                             │   │
+│  │   • asking >= 0.2  → "asking"                             │   │
+│  │   • giving >= 0.3  → "giving"                             │   │
+│  │   • max >= 0.2691  → most confident class                 │   │
+│  │   • else           → "none"                               │   │
+│  └──────────────────────────────────────────────────────────┘   │
+│                              │                                   │
+│                              │ Soft Labels                       │
+│                              ▼                                   │
+│  ┌──────────────────────────────────────────────────────────┐   │
+│  │              LLAMA 3.2 3B + LoRA (Student)                │   │
+│  │                                                           │   │
+│  │   System: "Classify if this message involves PII.         │   │
+│  │            Reply with: none, asking, or giving."          │   │
+│  │                                                           │   │
+│  │   User: Message: "whats ur snap?"                         │   │
+│  │   Assistant: asking                                       │   │
+│  │                                                           │   │
+│  │   LoRA Config:                                            │   │
+│  │   • Rank: 16, Alpha: 32                                   │   │
+│  │   • Target: q,k,v,o,gate,up,down_proj                     │   │
+│  └──────────────────────────────────────────────────────────┘   │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+## 📊 Performance
+### Test Set Results (500 examples)
+| Metric | Value |
+|--------|-------|
+| **Accuracy** | 87.4% |
+| **None F1** | 0.93 |
+| **Asking F1** | 0.70 |
+| **Giving F1** | 0.80 |
+### Confusion Matrix
+|  | Pred: none | Pred: asking | Pred: giving |
+|--|------------|--------------|--------------|
+| **Actual: none** | 316 | 16 | 8 |
+| **Actual: asking** | 15 | 57 | 15 |
+| **Actual: giving** | 6 | 3 | 64 |
+## 🚀 Quick Start
+### With PEFT (Recommended)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+import torch
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-3.2-3B-Instruct",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
+# Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, "sugiv/sugiv-pii-classifier")
+def classify_pii(message: str) -> str:
+    """Classify a message for PII content."""
+    messages = [
+        {"role": "system", "content": 'Classify if this chat message involves PII (personal info). Reply with exactly one word: "none", "asking", or "giving".'},
+        {"role": "user", "content": f'Message: "{message}"'}
+    ]
+    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
+    with torch.no_grad():
+        outputs = model.generate(inputs, max_new_tokens=10, temperature=0, do_sample=False)
+    response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
+    return response.strip().lower().split()[0]
+# Examples
+print(classify_pii("whats ur snap?"))  # → asking
+print(classify_pii("my email is john@example.com"))  # → giving
+print(classify_pii("this game is so fun"))  # → none
+```
+### With Fireworks API (Production)
+```python
+import requests
+API_KEY = "your-fireworks-api-key"
+MODEL = "accounts/sugi205-8d1850/models/pii-classifier-llama3b-5k"
+def classify_pii(message: str) -> str:
+    response = requests.post(
+        "https://api.fireworks.ai/inference/v1/chat/completions",
+        headers={"Authorization": f"Bearer {API_KEY}"},
+        json={
+            "model": MODEL,
+            "messages": [
+                {"role": "system", "content": 'Classify if this message involves PII. Reply: none, asking, or giving.'},
+                {"role": "user", "content": f'Message: "{message}"'}
+            ],
+            "max_tokens": 10,
+            "temperature": 0
+        }
+    )
+    return response.json()["choices"][0]["message"]["content"].strip().lower()
+```
+## 📝 Label Definitions
+| Label | Description | Examples |
+|-------|-------------|----------|
+| **none** | No PII request or disclosure | "this game is fun", "lol nice shot" |
+| **asking** | Requesting personal information | "what's your phone number?", "where do you live?" |
+| **giving** | Sharing personal information | "my email is x@y.com", "I live at 123 Main St" |
+## 🔧 Training Details
+### Data Generation
+- **Generator**: DeepSeek API (deepseek-chat)
+- **Categories**: Benign (50%), Asking PII (25%), Giving PII (25%)
+- **Total Generated**: 5,000 examples
+- **Platforms Simulated**: Gaming chat, social media, messaging
+### Teacher Labeling
+- **Model**: [Roblox/roblox-pii-classifier](https://huggingface.co/Roblox/roblox-pii-classifier)
+- **Thresholds** (from Roblox documentation):
+  - `privacy_asking_for_pii >= 0.2` → "asking"
+  - `privacy_giving_pii >= 0.3` → "giving"
+  - `max(asking, giving) >= 0.2691` → most confident
+  - Otherwise → "none"
+### Fine-Tuning
+- **Platform**: Fireworks AI Supervised Fine-Tuning
+- **Method**: LoRA (Low-Rank Adaptation)
+- **Epochs**: 3
+- **Learning Rate**: 1e-4
+- **LoRA Rank**: 16
+- **LoRA Alpha**: 32
+- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+## 📚 References
+1. [Roblox Open-Sources PII Classifier](https://corp.roblox.com/newsroom/2025/11/open-sourcing-roblox-pii-classifier-ai-pii-detection-chat)
+2. [Fireworks: Fine-tuning LLMs as Classifiers](https://fireworks.ai/blog/Finetuning-LLMs-as-Classifiers)
+3. [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
+## 📄 License
+Apache 2.0
+## 🙏 Acknowledgments
+- **Roblox** for open-sourcing their PII classifier
+- **Fireworks AI** for the fine-tuning infrastructure and classifier approach
+- **Meta** for the Llama 3.2 base model
+## ⚠️ Limitations
+- Trained on synthetic data; may not cover all real-world PII patterns
+- English-only (teacher model is multilingual but training data is English)
+- Should be used as part of a broader content moderation system
+- Not a replacement for comprehensive privacy protection measures
+## 📧 Contact
+Created by [@sugiv](https://huggingface.co/sugiv)

adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": false,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "v_proj",
+    "down_proj",
+    "k_proj",
+    "up_proj",
+    "q_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7afbfb3716a976f731f6330ed735cd0bc2f0d37b43ee0c89c49745de2c596298
+size 48680136