Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -22,73 +22,82 @@ A text classification model for content moderation with age-appropriate filterin
|
|
| 22 |
- **PII Detection:** Emails, phones, addresses, credit cards, SSN
|
| 23 |
- **Social Media Protection:**
|
| 24 |
- <13: Block all social media sharing
|
| 25 |
-
- 13+: Allow
|
| 26 |
-
- **
|
| 27 |
|
| 28 |
-
##
|
| 29 |
|
| 30 |
```python
|
| 31 |
-
from
|
| 32 |
|
| 33 |
-
|
| 34 |
-
filter = DualModeFilter("darwinkernelpanic/moderat")
|
| 35 |
-
result = filter.check("damn that's crazy", age=15)
|
| 36 |
-
# -> ALLOWED (reaction swearing permitted for 13+)
|
| 37 |
|
| 38 |
-
#
|
| 39 |
-
|
|
|
|
| 40 |
|
| 41 |
-
|
| 42 |
result = filter.check("My email is test@gmail.com", age=15)
|
| 43 |
# -> BLOCKED (PII detected)
|
| 44 |
|
|
|
|
| 45 |
result = filter.check("Follow me on instagram @user", age=15)
|
| 46 |
-
# -> ALLOWED
|
| 47 |
|
|
|
|
| 48 |
result = filter.check("DM me privately, don't tell parents", age=14)
|
| 49 |
# -> BLOCKED (grooming detected)
|
| 50 |
```
|
| 51 |
|
| 52 |
-
## PII Detection
|
| 53 |
|
| 54 |
-
| PII Type |
|
| 55 |
-
|
| 56 |
-
| Email |
|
| 57 |
-
| Phone |
|
| 58 |
-
| Address |
|
| 59 |
-
| Credit Card |
|
| 60 |
-
| SSN |
|
| 61 |
-
| Social Media | Depends
|
| 62 |
|
| 63 |
## Social Media Rules
|
| 64 |
|
| 65 |
-
| Age |
|
| 66 |
-
|
| 67 |
-
| <13 | β Blocked |
|
| 68 |
| 13+ | β
Allowed | β Blocked |
|
| 69 |
|
|
|
|
|
|
|
| 70 |
## Content Labels
|
| 71 |
|
| 72 |
-
|
|
| 73 |
-
|
| 74 |
| "damn that's crazy" | β Blocked | β
Allowed |
|
|
|
|
| 75 |
| "you're trash" | β Blocked | β Blocked |
|
| 76 |
| "kill yourself" | β Blocked | β Blocked |
|
| 77 |
|
| 78 |
## Model Details
|
| 79 |
|
| 80 |
-
- **Algorithm:** Multinomial Naive Bayes
|
| 81 |
-
- **
|
|
|
|
| 82 |
- **Features:** 10,000 max, 1-3 ngrams
|
| 83 |
-
- **Training samples:** 215
|
| 84 |
|
| 85 |
## Files
|
| 86 |
|
| 87 |
-
- `moderation_model.pkl` -
|
| 88 |
-
- `inference.py` - Basic inference
|
| 89 |
- `pii_extension.py` - PII + grooming detection
|
| 90 |
-
- `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
-
##
|
| 93 |
|
| 94 |
-
|
|
|
|
|
|
|
|
|
| 22 |
- **PII Detection:** Emails, phones, addresses, credit cards, SSN
|
| 23 |
- **Social Media Protection:**
|
| 24 |
- <13: Block all social media sharing
|
| 25 |
+
- 13+: Allow, block only if grooming detected
|
| 26 |
+
- **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret"
|
| 27 |
|
| 28 |
+
## Quick Start
|
| 29 |
|
| 30 |
```python
|
| 31 |
+
from pii_extension import CombinedModerationFilter
|
| 32 |
|
| 33 |
+
filter = CombinedModerationFilter("darwinkernelpanic/moderat")
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
# Content moderation
|
| 36 |
+
result = filter.check("damn that's crazy", age=15)
|
| 37 |
+
# -> ALLOWED (reaction swearing for 13+)
|
| 38 |
|
| 39 |
+
# PII blocking (all ages)
|
| 40 |
result = filter.check("My email is test@gmail.com", age=15)
|
| 41 |
# -> BLOCKED (PII detected)
|
| 42 |
|
| 43 |
+
# Social media (13+ allowed)
|
| 44 |
result = filter.check("Follow me on instagram @user", age=15)
|
| 45 |
+
# -> ALLOWED
|
| 46 |
|
| 47 |
+
# Grooming detection
|
| 48 |
result = filter.check("DM me privately, don't tell parents", age=14)
|
| 49 |
# -> BLOCKED (grooming detected)
|
| 50 |
```
|
| 51 |
|
| 52 |
+
## PII Detection Rules
|
| 53 |
|
| 54 |
+
| PII Type | All Ages | Example |
|
| 55 |
+
|----------|----------|---------|
|
| 56 |
+
| Email | β Block | `john@example.com` |
|
| 57 |
+
| Phone | β Block | `555-123-4567` |
|
| 58 |
+
| Address | β Block | `123 Main Street` |
|
| 59 |
+
| Credit Card | β Block | `4111-1111-1111-1111` |
|
| 60 |
+
| SSN | β Block | `123-45-6789` |
|
| 61 |
+
| Social Media | Depends | See below |
|
| 62 |
|
| 63 |
## Social Media Rules
|
| 64 |
|
| 65 |
+
| Age | Plain Share | With Grooming Context |
|
| 66 |
+
|-----|-------------|----------------------|
|
| 67 |
+
| <13 | β Blocked | β Blocked |
|
| 68 |
| 13+ | β
Allowed | β Blocked |
|
| 69 |
|
| 70 |
+
**Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc.
|
| 71 |
+
|
| 72 |
## Content Labels
|
| 73 |
|
| 74 |
+
| Text | <13 | 13+ |
|
| 75 |
+
|------|-----|-----|
|
| 76 |
| "damn that's crazy" | β Blocked | β
Allowed |
|
| 77 |
+
| "shit that sucks" | β Blocked | β
Allowed |
|
| 78 |
| "you're trash" | β Blocked | β Blocked |
|
| 79 |
| "kill yourself" | β Blocked | β Blocked |
|
| 80 |
|
| 81 |
## Model Details
|
| 82 |
|
| 83 |
+
- **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII
|
| 84 |
+
- **Content accuracy:** 77%
|
| 85 |
+
- **PII detection:** Regex-based (fast, no ML)
|
| 86 |
- **Features:** 10,000 max, 1-3 ngrams
|
|
|
|
| 87 |
|
| 88 |
## Files
|
| 89 |
|
| 90 |
+
- `moderation_model.pkl` - Content moderation model
|
|
|
|
| 91 |
- `pii_extension.py` - PII + grooming detection
|
| 92 |
+
- `inference.py` - Basic inference
|
| 93 |
+
- `moderat_speed_test.ipynb` - Colab notebook
|
| 94 |
+
|
| 95 |
+
## Colab
|
| 96 |
+
|
| 97 |
+
Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb)
|
| 98 |
|
| 99 |
+
## Speed
|
| 100 |
|
| 101 |
+
- Single inference: ~2-5ms
|
| 102 |
+
- With PII check: ~3-7ms
|
| 103 |
+
- Throughput: ~300-500 texts/sec
|