File size: 3,003 Bytes
7db9699 8a0597c 7db9699 8a0597c 7db9699 8a0597c 85c76fb 8a0597c 52c80b4 7db9699 52c80b4 7db9699 52c80b4 7db9699 52c80b4 8a0597c 52c80b4 8a0597c 52c80b4 8a0597c 52c80b4 8a0597c 52c80b4 8a0597c 52c80b4 8a0597c 7db9699 85c76fb 7db9699 85c76fb 8a0597c 7db9699 52c80b4 8a0597c 7db9699 52c80b4 8a0597c 52c80b4 7db9699 52c80b4 7db9699 8a0597c 52c80b4 8a0597c 52c80b4 8a0597c 52c80b4 8a0597c 52c80b4 8a0597c 52c80b4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
language: en
license: mit
library_name: sklearn
tags:
- content-moderation
- text-classification
- safety
- dual-mode
- pii-detection
- child-safety
---
# moderat - Dual-Mode Content Moderation + PII Filter
A text classification model for content moderation with age-appropriate filtering and PII detection.
## Features
- **Dual-mode filtering:** <13 (strict) vs 13+ (laxed)
- **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
- **PII Detection:** Emails, phones, addresses, credit cards, SSN
- **Unicode Deobfuscation:** Detects circled letters (β), double-struck (β), fullwidth, mathematical symbols
- **Social Media Protection:**
- <13: Block all social media sharing
- 13+: Allow, block only if grooming detected
- **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret"
## Quick Start
```python
from pii_extension import CombinedModerationFilter
filter = CombinedModerationFilter("darwinkernelpanic/moderat")
# Content moderation
result = filter.check("damn that's crazy", age=15)
# -> ALLOWED (reaction swearing for 13+)
# PII blocking (all ages)
result = filter.check("My email is test@gmail.com", age=15)
# -> BLOCKED (PII detected)
# Social media (13+ allowed)
result = filter.check("Follow me on instagram @user", age=15)
# -> ALLOWED
# Grooming detection
result = filter.check("DM me privately, don't tell parents", age=14)
# -> BLOCKED (grooming detected)
```
## Unicode Deobfuscation
Automatically detects and normalizes unicode bypass attempts:
| Technique | Example | Normalized |
|-----------|---------|------------|
| Circled letters | `ββ€ββ` | `fuck` |
| Double-struck | `ββ` | `CH` |
| Fullwidth | `οΌ¦` | `F` |
| Mathematical | `π` | `f` |
**All obfuscated text is normalized before moderation checks.**
## Social Media Rules
| Age | Plain Share | With Grooming Context |
|-----|-------------|----------------------|
| <13 | β Blocked | β Blocked |
| 13+ | β
Allowed | β Blocked |
**Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc.
## Content Labels
| Text | <13 | 13+ |
|------|-----|-----|
| "damn that's crazy" | β Blocked | β
Allowed |
| "shit that sucks" | β Blocked | β
Allowed |
| "you're trash" | β Blocked | β Blocked |
| "kill yourself" | β Blocked | β Blocked |
## Model Details
- **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII
- **Content accuracy:** 77%
- **PII detection:** Regex-based (fast, no ML)
- **Features:** 10,000 max, 1-3 ngrams
## Files
- `moderation_model.pkl` - Content moderation model
- `pii_extension.py` - PII + grooming detection
- `inference.py` - Basic inference
- `moderat_speed_test.ipynb` - Colab notebook
## Colab
Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb)
## Speed
- Single inference: ~2-5ms
- With PII check: ~3-7ms
- Throughput: ~300-500 texts/sec
|