|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
library_name: sklearn |
|
|
tags: |
|
|
- content-moderation |
|
|
- text-classification |
|
|
- safety |
|
|
- dual-mode |
|
|
- pii-detection |
|
|
- child-safety |
|
|
--- |
|
|
|
|
|
# moderat - Dual-Mode Content Moderation + PII Filter |
|
|
|
|
|
A text classification model for content moderation with age-appropriate filtering and PII detection. |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Dual-mode filtering:** <13 (strict) vs 13+ (laxed) |
|
|
- **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam |
|
|
- **PII Detection:** Emails, phones, addresses, credit cards, SSN |
|
|
- **Unicode Deobfuscation:** Detects circled letters (β), double-struck (β), fullwidth, mathematical symbols |
|
|
- **Social Media Protection:** |
|
|
- <13: Block all social media sharing |
|
|
- 13+: Allow, block only if grooming detected |
|
|
- **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret" |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```python |
|
|
from pii_extension import CombinedModerationFilter |
|
|
|
|
|
filter = CombinedModerationFilter("darwinkernelpanic/moderat") |
|
|
|
|
|
# Content moderation |
|
|
result = filter.check("damn that's crazy", age=15) |
|
|
# -> ALLOWED (reaction swearing for 13+) |
|
|
|
|
|
# PII blocking (all ages) |
|
|
result = filter.check("My email is test@gmail.com", age=15) |
|
|
# -> BLOCKED (PII detected) |
|
|
|
|
|
# Social media (13+ allowed) |
|
|
result = filter.check("Follow me on instagram @user", age=15) |
|
|
# -> ALLOWED |
|
|
|
|
|
# Grooming detection |
|
|
result = filter.check("DM me privately, don't tell parents", age=14) |
|
|
# -> BLOCKED (grooming detected) |
|
|
``` |
|
|
|
|
|
## Unicode Deobfuscation |
|
|
|
|
|
Automatically detects and normalizes unicode bypass attempts: |
|
|
|
|
|
| Technique | Example | Normalized | |
|
|
|-----------|---------|------------| |
|
|
| Circled letters | `ββ€ββ` | `fuck` | |
|
|
| Double-struck | `ββ` | `CH` | |
|
|
| Fullwidth | `οΌ¦` | `F` | |
|
|
| Mathematical | `π` | `f` | |
|
|
|
|
|
**All obfuscated text is normalized before moderation checks.** |
|
|
|
|
|
## Social Media Rules |
|
|
|
|
|
| Age | Plain Share | With Grooming Context | |
|
|
|-----|-------------|----------------------| |
|
|
| <13 | β Blocked | β Blocked | |
|
|
| 13+ | β
Allowed | β Blocked | |
|
|
|
|
|
**Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc. |
|
|
|
|
|
## Content Labels |
|
|
|
|
|
| Text | <13 | 13+ | |
|
|
|------|-----|-----| |
|
|
| "damn that's crazy" | β Blocked | β
Allowed | |
|
|
| "shit that sucks" | β Blocked | β
Allowed | |
|
|
| "you're trash" | β Blocked | β Blocked | |
|
|
| "kill yourself" | β Blocked | β Blocked | |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII |
|
|
- **Content accuracy:** 77% |
|
|
- **PII detection:** Regex-based (fast, no ML) |
|
|
- **Features:** 10,000 max, 1-3 ngrams |
|
|
|
|
|
## Files |
|
|
|
|
|
- `moderation_model.pkl` - Content moderation model |
|
|
- `pii_extension.py` - PII + grooming detection |
|
|
- `inference.py` - Basic inference |
|
|
- `moderat_speed_test.ipynb` - Colab notebook |
|
|
|
|
|
## Colab |
|
|
|
|
|
Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb) |
|
|
|
|
|
## Speed |
|
|
|
|
|
- Single inference: ~2-5ms |
|
|
- With PII check: ~3-7ms |
|
|
- Throughput: ~300-500 texts/sec |
|
|
|