File size: 3,003 Bytes

7db9699
 
 
 
 
 
 
 
 
8a0597c
 
7db9699
 
8a0597c
7db9699
8a0597c
 
 
 
 
 
 
85c76fb
8a0597c
 
52c80b4
 
7db9699
52c80b4
7db9699
 
52c80b4
7db9699
52c80b4
8a0597c
52c80b4
 
 
8a0597c
52c80b4
8a0597c
 
 
52c80b4
8a0597c
52c80b4
8a0597c
52c80b4
8a0597c
 
7db9699
 
85c76fb
7db9699
85c76fb
 
 
 
 
 
 
 
 
 
8a0597c
 
7db9699
52c80b4
 
 
8a0597c
7db9699
52c80b4
 
8a0597c
 
52c80b4
 
7db9699
52c80b4
7db9699
 
8a0597c
 
 
52c80b4
 
 
8a0597c
 
 
 
52c80b4
8a0597c
52c80b4
 
 
 
 
 
8a0597c
52c80b4
8a0597c
52c80b4

---
language: en
license: mit
library_name: sklearn
tags:
- content-moderation
- text-classification
- safety
- dual-mode
- pii-detection
- child-safety
---

# moderat - Dual-Mode Content Moderation + PII Filter

A text classification model for content moderation with age-appropriate filtering and PII detection.

## Features

- **Dual-mode filtering:** <13 (strict) vs 13+ (laxed)
- **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
- **PII Detection:** Emails, phones, addresses, credit cards, SSN
- **Unicode Deobfuscation:** Detects circled letters (ⓕ), double-struck (ℂ), fullwidth, mathematical symbols
- **Social Media Protection:** 
  - <13: Block all social media sharing
  - 13+: Allow, block only if grooming detected
- **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret"

## Quick Start

```python
from pii_extension import CombinedModerationFilter

filter = CombinedModerationFilter("darwinkernelpanic/moderat")

# Content moderation
result = filter.check("damn that's crazy", age=15)
# -> ALLOWED (reaction swearing for 13+)

# PII blocking (all ages)
result = filter.check("My email is test@gmail.com", age=15)
# -> BLOCKED (PII detected)

# Social media (13+ allowed)
result = filter.check("Follow me on instagram @user", age=15)
# -> ALLOWED

# Grooming detection
result = filter.check("DM me privately, don't tell parents", age=14)
# -> BLOCKED (grooming detected)
```

## Unicode Deobfuscation

Automatically detects and normalizes unicode bypass attempts:

| Technique | Example | Normalized |
|-----------|---------|------------|
| Circled letters | `ⓕⓤⓒⓚ` | `fuck` |
| Double-struck | `ℂℍ` | `CH` |
| Fullwidth | `Ｆ` | `F` |
| Mathematical | `𝐟` | `f` |

**All obfuscated text is normalized before moderation checks.**

## Social Media Rules

| Age | Plain Share | With Grooming Context |
|-----|-------------|----------------------|
| <13 | ❌ Blocked | ❌ Blocked |
| 13+ | ✅ Allowed | ❌ Blocked |

**Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc.

## Content Labels

| Text | <13 | 13+ |
|------|-----|-----|
| "damn that's crazy" | ❌ Blocked | ✅ Allowed |
| "shit that sucks" | ❌ Blocked | ✅ Allowed |
| "you're trash" | ❌ Blocked | ❌ Blocked |
| "kill yourself" | ❌ Blocked | ❌ Blocked |

## Model Details

- **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII
- **Content accuracy:** 77%
- **PII detection:** Regex-based (fast, no ML)
- **Features:** 10,000 max, 1-3 ngrams

## Files

- `moderation_model.pkl` - Content moderation model
- `pii_extension.py` - PII + grooming detection
- `inference.py` - Basic inference
- `moderat_speed_test.ipynb` - Colab notebook

## Colab

Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb)

## Speed

- Single inference: ~2-5ms
- With PII check: ~3-7ms
- Throughput: ~300-500 texts/sec