--- language: en license: mit library_name: sklearn tags: - content-moderation - text-classification - safety - dual-mode - pii-detection - child-safety --- # moderat - Dual-Mode Content Moderation + PII Filter A text classification model for content moderation with age-appropriate filtering and PII detection. ## Features - **Dual-mode filtering:** <13 (strict) vs 13+ (laxed) - **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam - **PII Detection:** Emails, phones, addresses, credit cards, SSN - **Unicode Deobfuscation:** Detects circled letters (ⓕ), double-struck (ℂ), fullwidth, mathematical symbols - **Social Media Protection:** - <13: Block all social media sharing - 13+: Allow, block only if grooming detected - **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret" ## Quick Start ```python from pii_extension import CombinedModerationFilter filter = CombinedModerationFilter("darwinkernelpanic/moderat") # Content moderation result = filter.check("damn that's crazy", age=15) # -> ALLOWED (reaction swearing for 13+) # PII blocking (all ages) result = filter.check("My email is test@gmail.com", age=15) # -> BLOCKED (PII detected) # Social media (13+ allowed) result = filter.check("Follow me on instagram @user", age=15) # -> ALLOWED # Grooming detection result = filter.check("DM me privately, don't tell parents", age=14) # -> BLOCKED (grooming detected) ``` ## Unicode Deobfuscation Automatically detects and normalizes unicode bypass attempts: | Technique | Example | Normalized | |-----------|---------|------------| | Circled letters | `ⓕⓤⓒⓚ` | `fuck` | | Double-struck | `ℂℍ` | `CH` | | Fullwidth | `F` | `F` | | Mathematical | `𝐟` | `f` | **All obfuscated text is normalized before moderation checks.** ## Social Media Rules | Age | Plain Share | With Grooming Context | |-----|-------------|----------------------| | <13 | ❌ Blocked | ❌ Blocked | | 13+ | ✅ Allowed | ❌ Blocked | **Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc. ## Content Labels | Text | <13 | 13+ | |------|-----|-----| | "damn that's crazy" | ❌ Blocked | ✅ Allowed | | "shit that sucks" | ❌ Blocked | ✅ Allowed | | "you're trash" | ❌ Blocked | ❌ Blocked | | "kill yourself" | ❌ Blocked | ❌ Blocked | ## Model Details - **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII - **Content accuracy:** 77% - **PII detection:** Regex-based (fast, no ML) - **Features:** 10,000 max, 1-3 ngrams ## Files - `moderation_model.pkl` - Content moderation model - `pii_extension.py` - PII + grooming detection - `inference.py` - Basic inference - `moderat_speed_test.ipynb` - Colab notebook ## Colab Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb) ## Speed - Single inference: ~2-5ms - With PII check: ~3-7ms - Throughput: ~300-500 texts/sec