darwinkernelpanic
/

moderat

@@ -7,32 +7,88 @@ tags:
 - text-classification
 - safety
 - dual-mode
 ---
-# moderat - Dual-Mode Content Moderation
-A text classification model for content moderation with age-appropriate filtering.
 ## Usage
 ```python
 from inference import DualModeFilter
 filter = DualModeFilter("darwinkernelpanic/moderat")
 result = filter.check("damn that's crazy", age=15)
 # -> ALLOWED (reaction swearing permitted for 13+)
 ```
-## Model Details
-- **Algorithm:** Multinomial Naive Bayes with TF-IDF
-- **Test accuracy:** 77%
-- **Classes:** 6 (Safe, Harassment, Swearing-Reaction, Swearing-Aggressive, Hate-Speech, Spam)
-## Age Modes
-| Content | <13 | 13+ |
-|---------|-----|-----|
 | "damn that's crazy" | ❌ Blocked | ✅ Allowed |
 | "you're trash" | ❌ Blocked | ❌ Blocked |
 | "kill yourself" | ❌ Blocked | ❌ Blocked |

 - text-classification
 - safety
 - dual-mode
+- pii-detection
+- child-safety
 ---
+# moderat - Dual-Mode Content Moderation + PII Filter
+A text classification model for content moderation with age-appropriate filtering and PII detection.
+## Features
+- **Dual-mode filtering:** <13 (strict) vs 13+ (laxed)
+- **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
+- **PII Detection:** Emails, phones, addresses, credit cards, SSN
+- **Social Media Protection:**
+  - <13: Block all social media sharing
+  - 13+: Allow but detect grooming patterns
+- **Context-aware:** Distinguishes reaction swearing from targeted aggression
 ## Usage
 ```python
 from inference import DualModeFilter
+# Basic content moderation
 filter = DualModeFilter("darwinkernelpanic/moderat")
 result = filter.check("damn that's crazy", age=15)
 # -> ALLOWED (reaction swearing permitted for 13+)
+# With PII detection (use pii_extension.py)
+from pii_extension import CombinedModerationFilter
+filter = CombinedModerationFilter("./moderation_model.pkl")
+result = filter.check("My email is test@gmail.com", age=15)
+# -> BLOCKED (PII detected)
+result = filter.check("Follow me on instagram @user", age=15)
+# -> ALLOWED (social media OK for 13+)
+result = filter.check("DM me privately, don't tell parents", age=14)
+# -> BLOCKED (grooming detected)
 ```
+## PII Detection
+| PII Type | Blocked (All Ages) |
+|----------|-------------------|
+| Email | ✅ Yes |
+| Phone | ✅ Yes |
+| Address | ✅ Yes |
+| Credit Card | ✅ Yes |
+| SSN | ✅ Yes |
+| Social Media | Depends on age |
+## Social Media Rules
+| Age | Social Media | Grooming Context |
+|-----|--------------|------------------|
+| <13 | ❌ Blocked | N/A |
+| 13+ | ✅ Allowed | ❌ Blocked |
+## Content Labels
+| Label | <13 | 13+ |
+|-------|-----|-----|
 | "damn that's crazy" | ❌ Blocked | ✅ Allowed |
 | "you're trash" | ❌ Blocked | ❌ Blocked |
 | "kill yourself" | ❌ Blocked | ❌ Blocked |
+## Model Details
+- **Algorithm:** Multinomial Naive Bayes with TF-IDF
+- **Test accuracy:** 77%
+- **Features:** 10,000 max, 1-3 ngrams
+- **Training samples:** 215
+## Files
+- `moderation_model.pkl` - Trained model
+- `inference.py` - Basic inference
+- `pii_extension.py` - PII + grooming detection
+- `enhanced_moderation.py` - Training script
+## Colab Notebook
+Try it: [moderat_speed_test.ipynb](./moderat_speed_test.ipynb)