sugiv commited on
Commit
59baf8d
Β·
verified Β·
1 Parent(s): 7a45aee

Upload sugiv-pii-classifier LoRA adapter

Browse files
Files changed (3) hide show
  1. README.md +229 -0
  2. adapter_config.json +39 -0
  3. adapter_model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,229 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: meta-llama/Llama-3.2-3B-Instruct
4
+ tags:
5
+ - pii-detection
6
+ - text-classification
7
+ - llama
8
+ - lora
9
+ - peft
10
+ - moderation
11
+ - safety
12
+ - privacy
13
+ - distillation
14
+ datasets:
15
+ - synthetic
16
+ language:
17
+ - en
18
+ pipeline_tag: text-classification
19
+ library_name: peft
20
+ ---
21
+
22
+ # sugiv-pii-classifier
23
+
24
+ A lightweight **3B parameter** PII (Personally Identifiable Information) classifier, distilled from the [Roblox PII Classifier](https://huggingface.co/Roblox/roblox-pii-classifier) (560M XLM-RoBERTa) using **teacher-student distillation** with Fireworks AI's LoRA fine-tuning.
25
+
26
+ ## 🎯 Model Overview
27
+
28
+ | Attribute | Value |
29
+ |-----------|-------|
30
+ | **Base Model** | [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) |
31
+ | **Teacher Model** | [Roblox/roblox-pii-classifier](https://huggingface.co/Roblox/roblox-pii-classifier) |
32
+ | **Training Method** | LoRA (Low-Rank Adaptation) via Fireworks AI SFT |
33
+ | **LoRA Rank** | 16 |
34
+ | **Training Examples** | 4,000 (5,000 generated, 80/10/10 split) |
35
+ | **Test Accuracy** | **87.4%** on held-out test set |
36
+ | **Labels** | `none`, `asking`, `giving` |
37
+
38
+ ## πŸ—οΈ Architecture
39
+
40
+ This model uses a novel **"LLM Head as Classifier"** approach inspired by [Fireworks AI's blog post](https://fireworks.ai/blog/Finetuning-LLMs-as-Classifiers):
41
+
42
+ ```
43
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
44
+ β”‚ TEACHER-STUDENT DISTILLATION β”‚
45
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
46
+ β”‚ β”‚
47
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” Labels β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
48
+ β”‚ β”‚ DeepSeek API β”‚ ──────────────► β”‚ Synthetic Dataset β”‚ β”‚
49
+ β”‚ β”‚ (Data Generator) β”‚ β”‚ 5,000 examples β”‚ β”‚
50
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
51
+ β”‚ β”‚ β”‚
52
+ β”‚ β–Ό β”‚
53
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
54
+ β”‚ β”‚ ROBLOX PII CLASSIFIER (Teacher) β”‚ β”‚
55
+ β”‚ β”‚ XLM-RoBERTa-Large (560M) β”‚ β”‚
56
+ β”‚ β”‚ β”‚ β”‚
57
+ β”‚ β”‚ Thresholds: β”‚ β”‚
58
+ β”‚ β”‚ β€’ asking >= 0.2 β†’ "asking" β”‚ β”‚
59
+ β”‚ β”‚ β€’ giving >= 0.3 β†’ "giving" β”‚ β”‚
60
+ β”‚ β”‚ β€’ max >= 0.2691 β†’ most confident class β”‚ β”‚
61
+ β”‚ β”‚ β€’ else β†’ "none" β”‚ β”‚
62
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
63
+ β”‚ β”‚ β”‚
64
+ β”‚ β”‚ Soft Labels β”‚
65
+ β”‚ β–Ό β”‚
66
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
67
+ β”‚ β”‚ LLAMA 3.2 3B + LoRA (Student) β”‚ β”‚
68
+ β”‚ β”‚ β”‚ β”‚
69
+ β”‚ β”‚ System: "Classify if this message involves PII. β”‚ β”‚
70
+ β”‚ β”‚ Reply with: none, asking, or giving." β”‚ β”‚
71
+ β”‚ β”‚ β”‚ β”‚
72
+ β”‚ β”‚ User: Message: "whats ur snap?" β”‚ β”‚
73
+ β”‚ β”‚ Assistant: asking β”‚ β”‚
74
+ β”‚ β”‚ β”‚ β”‚
75
+ β”‚ β”‚ LoRA Config: β”‚ β”‚
76
+ β”‚ β”‚ β€’ Rank: 16, Alpha: 32 β”‚ β”‚
77
+ β”‚ β”‚ β€’ Target: q,k,v,o,gate,up,down_proj β”‚ β”‚
78
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
79
+ β”‚ β”‚
80
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
81
+ ```
82
+
83
+ ## πŸ“Š Performance
84
+
85
+ ### Test Set Results (500 examples)
86
+
87
+ | Metric | Value |
88
+ |--------|-------|
89
+ | **Accuracy** | 87.4% |
90
+ | **None F1** | 0.93 |
91
+ | **Asking F1** | 0.70 |
92
+ | **Giving F1** | 0.80 |
93
+
94
+ ### Confusion Matrix
95
+
96
+ | | Pred: none | Pred: asking | Pred: giving |
97
+ |--|------------|--------------|--------------|
98
+ | **Actual: none** | 316 | 16 | 8 |
99
+ | **Actual: asking** | 15 | 57 | 15 |
100
+ | **Actual: giving** | 6 | 3 | 64 |
101
+
102
+ ## πŸš€ Quick Start
103
+
104
+ ### With PEFT (Recommended)
105
+
106
+ ```python
107
+ from transformers import AutoTokenizer, AutoModelForCausalLM
108
+ from peft import PeftModel
109
+ import torch
110
+
111
+ # Load base model
112
+ base_model = AutoModelForCausalLM.from_pretrained(
113
+ "meta-llama/Llama-3.2-3B-Instruct",
114
+ torch_dtype=torch.bfloat16,
115
+ device_map="auto"
116
+ )
117
+ tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
118
+
119
+ # Load LoRA adapter
120
+ model = PeftModel.from_pretrained(base_model, "sugiv/sugiv-pii-classifier")
121
+
122
+ def classify_pii(message: str) -> str:
123
+ """Classify a message for PII content."""
124
+ messages = [
125
+ {"role": "system", "content": 'Classify if this chat message involves PII (personal info). Reply with exactly one word: "none", "asking", or "giving".'},
126
+ {"role": "user", "content": f'Message: "{message}"'}
127
+ ]
128
+
129
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
130
+
131
+ with torch.no_grad():
132
+ outputs = model.generate(inputs, max_new_tokens=10, temperature=0, do_sample=False)
133
+
134
+ response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
135
+ return response.strip().lower().split()[0]
136
+
137
+ # Examples
138
+ print(classify_pii("whats ur snap?")) # β†’ asking
139
+ print(classify_pii("my email is john@example.com")) # β†’ giving
140
+ print(classify_pii("this game is so fun")) # β†’ none
141
+ ```
142
+
143
+ ### With Fireworks API (Production)
144
+
145
+ ```python
146
+ import requests
147
+
148
+ API_KEY = "your-fireworks-api-key"
149
+ MODEL = "accounts/sugi205-8d1850/models/pii-classifier-llama3b-5k"
150
+
151
+ def classify_pii(message: str) -> str:
152
+ response = requests.post(
153
+ "https://api.fireworks.ai/inference/v1/chat/completions",
154
+ headers={"Authorization": f"Bearer {API_KEY}"},
155
+ json={
156
+ "model": MODEL,
157
+ "messages": [
158
+ {"role": "system", "content": 'Classify if this message involves PII. Reply: none, asking, or giving.'},
159
+ {"role": "user", "content": f'Message: "{message}"'}
160
+ ],
161
+ "max_tokens": 10,
162
+ "temperature": 0
163
+ }
164
+ )
165
+ return response.json()["choices"][0]["message"]["content"].strip().lower()
166
+ ```
167
+
168
+ ## πŸ“ Label Definitions
169
+
170
+ | Label | Description | Examples |
171
+ |-------|-------------|----------|
172
+ | **none** | No PII request or disclosure | "this game is fun", "lol nice shot" |
173
+ | **asking** | Requesting personal information | "what's your phone number?", "where do you live?" |
174
+ | **giving** | Sharing personal information | "my email is x@y.com", "I live at 123 Main St" |
175
+
176
+ ## πŸ”§ Training Details
177
+
178
+ ### Data Generation
179
+
180
+ - **Generator**: DeepSeek API (deepseek-chat)
181
+ - **Categories**: Benign (50%), Asking PII (25%), Giving PII (25%)
182
+ - **Total Generated**: 5,000 examples
183
+ - **Platforms Simulated**: Gaming chat, social media, messaging
184
+
185
+ ### Teacher Labeling
186
+
187
+ - **Model**: [Roblox/roblox-pii-classifier](https://huggingface.co/Roblox/roblox-pii-classifier)
188
+ - **Thresholds** (from Roblox documentation):
189
+ - `privacy_asking_for_pii >= 0.2` β†’ "asking"
190
+ - `privacy_giving_pii >= 0.3` β†’ "giving"
191
+ - `max(asking, giving) >= 0.2691` β†’ most confident
192
+ - Otherwise β†’ "none"
193
+
194
+ ### Fine-Tuning
195
+
196
+ - **Platform**: Fireworks AI Supervised Fine-Tuning
197
+ - **Method**: LoRA (Low-Rank Adaptation)
198
+ - **Epochs**: 3
199
+ - **Learning Rate**: 1e-4
200
+ - **LoRA Rank**: 16
201
+ - **LoRA Alpha**: 32
202
+ - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
203
+
204
+ ## πŸ“š References
205
+
206
+ 1. [Roblox Open-Sources PII Classifier](https://corp.roblox.com/newsroom/2025/11/open-sourcing-roblox-pii-classifier-ai-pii-detection-chat)
207
+ 2. [Fireworks: Fine-tuning LLMs as Classifiers](https://fireworks.ai/blog/Finetuning-LLMs-as-Classifiers)
208
+ 3. [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
209
+
210
+ ## πŸ“„ License
211
+
212
+ Apache 2.0
213
+
214
+ ## πŸ™ Acknowledgments
215
+
216
+ - **Roblox** for open-sourcing their PII classifier
217
+ - **Fireworks AI** for the fine-tuning infrastructure and classifier approach
218
+ - **Meta** for the Llama 3.2 base model
219
+
220
+ ## ⚠️ Limitations
221
+
222
+ - Trained on synthetic data; may not cover all real-world PII patterns
223
+ - English-only (teacher model is multilingual but training data is English)
224
+ - Should be used as part of a broader content moderation system
225
+ - Not a replacement for comprehensive privacy protection measures
226
+
227
+ ## πŸ“§ Contact
228
+
229
+ Created by [@sugiv](https://huggingface.co/sugiv)
adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": false,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 16,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "o_proj",
28
+ "v_proj",
29
+ "down_proj",
30
+ "k_proj",
31
+ "up_proj",
32
+ "q_proj",
33
+ "gate_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7afbfb3716a976f731f6330ed735cd0bc2f0d37b43ee0c89c49745de2c596298
3
+ size 48680136