moderat / README.md
darwinkernelpanic's picture
Upload README.md with huggingface_hub
85c76fb verified
---
language: en
license: mit
library_name: sklearn
tags:
- content-moderation
- text-classification
- safety
- dual-mode
- pii-detection
- child-safety
---
# moderat - Dual-Mode Content Moderation + PII Filter
A text classification model for content moderation with age-appropriate filtering and PII detection.
## Features
- **Dual-mode filtering:** <13 (strict) vs 13+ (laxed)
- **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
- **PII Detection:** Emails, phones, addresses, credit cards, SSN
- **Unicode Deobfuscation:** Detects circled letters (β“•), double-struck (β„‚), fullwidth, mathematical symbols
- **Social Media Protection:**
- <13: Block all social media sharing
- 13+: Allow, block only if grooming detected
- **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret"
## Quick Start
```python
from pii_extension import CombinedModerationFilter
filter = CombinedModerationFilter("darwinkernelpanic/moderat")
# Content moderation
result = filter.check("damn that's crazy", age=15)
# -> ALLOWED (reaction swearing for 13+)
# PII blocking (all ages)
result = filter.check("My email is test@gmail.com", age=15)
# -> BLOCKED (PII detected)
# Social media (13+ allowed)
result = filter.check("Follow me on instagram @user", age=15)
# -> ALLOWED
# Grooming detection
result = filter.check("DM me privately, don't tell parents", age=14)
# -> BLOCKED (grooming detected)
```
## Unicode Deobfuscation
Automatically detects and normalizes unicode bypass attempts:
| Technique | Example | Normalized |
|-----------|---------|------------|
| Circled letters | `β“•β“€β“’β“š` | `fuck` |
| Double-struck | `ℂℍ` | `CH` |
| Fullwidth | `οΌ¦` | `F` |
| Mathematical | `𝐟` | `f` |
**All obfuscated text is normalized before moderation checks.**
## Social Media Rules
| Age | Plain Share | With Grooming Context |
|-----|-------------|----------------------|
| <13 | ❌ Blocked | ❌ Blocked |
| 13+ | βœ… Allowed | ❌ Blocked |
**Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc.
## Content Labels
| Text | <13 | 13+ |
|------|-----|-----|
| "damn that's crazy" | ❌ Blocked | βœ… Allowed |
| "shit that sucks" | ❌ Blocked | βœ… Allowed |
| "you're trash" | ❌ Blocked | ❌ Blocked |
| "kill yourself" | ❌ Blocked | ❌ Blocked |
## Model Details
- **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII
- **Content accuracy:** 77%
- **PII detection:** Regex-based (fast, no ML)
- **Features:** 10,000 max, 1-3 ngrams
## Files
- `moderation_model.pkl` - Content moderation model
- `pii_extension.py` - PII + grooming detection
- `inference.py` - Basic inference
- `moderat_speed_test.ipynb` - Colab notebook
## Colab
Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb)
## Speed
- Single inference: ~2-5ms
- With PII check: ~3-7ms
- Throughput: ~300-500 texts/sec