metadata
language: en
license: mit
library_name: sklearn
tags:
- content-moderation
- text-classification
- safety
- dual-mode
- pii-detection
- child-safety
moderat - Dual-Mode Content Moderation + PII Filter
A text classification model for content moderation with age-appropriate filtering and PII detection.
Features
- Dual-mode filtering: <13 (strict) vs 13+ (laxed)
- 6 content categories: Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
- PII Detection: Emails, phones, addresses, credit cards, SSN
- Unicode Deobfuscation: Detects circled letters (β), double-struck (β), fullwidth, mathematical symbols
- Social Media Protection:
- <13: Block all social media sharing
- 13+: Allow, block only if grooming detected
- Grooming Detection: Keywords like "dm me", "don't tell parents", "our secret"
Quick Start
from pii_extension import CombinedModerationFilter
filter = CombinedModerationFilter("darwinkernelpanic/moderat")
# Content moderation
result = filter.check("damn that's crazy", age=15)
# -> ALLOWED (reaction swearing for 13+)
# PII blocking (all ages)
result = filter.check("My email is test@gmail.com", age=15)
# -> BLOCKED (PII detected)
# Social media (13+ allowed)
result = filter.check("Follow me on instagram @user", age=15)
# -> ALLOWED
# Grooming detection
result = filter.check("DM me privately, don't tell parents", age=14)
# -> BLOCKED (grooming detected)
Unicode Deobfuscation
Automatically detects and normalizes unicode bypass attempts:
| Technique | Example | Normalized |
|---|---|---|
| Circled letters | ββ€ββ |
fuck |
| Double-struck | ββ |
CH |
| Fullwidth | οΌ¦ |
F |
| Mathematical | π |
f |
All obfuscated text is normalized before moderation checks.
Social Media Rules
| Age | Plain Share | With Grooming Context |
|---|---|---|
| <13 | β Blocked | β Blocked |
| 13+ | β Allowed | β Blocked |
Grooming keywords: "dm me", "don't tell", "secret", "send pics", "meet up", etc.
Content Labels
| Text | <13 | 13+ |
|---|---|---|
| "damn that's crazy" | β Blocked | β Allowed |
| "shit that sucks" | β Blocked | β Allowed |
| "you're trash" | β Blocked | β Blocked |
| "kill yourself" | β Blocked | β Blocked |
Model Details
- Algorithm: Multinomial Naive Bayes + TF-IDF + Regex PII
- Content accuracy: 77%
- PII detection: Regex-based (fast, no ML)
- Features: 10,000 max, 1-3 ngrams
Files
moderation_model.pkl- Content moderation modelpii_extension.py- PII + grooming detectioninference.py- Basic inferencemoderat_speed_test.ipynb- Colab notebook
Colab
Test it: Open in Colab
Speed
- Single inference: ~2-5ms
- With PII check: ~3-7ms
- Throughput: ~300-500 texts/sec