---
language: en
license: mit
library_name: sklearn
tags:
- content-moderation
- text-classification
- safety
- dual-mode
- pii-detection
- child-safety
---

# moderat - Dual-Mode Content Moderation + PII Filter

A text classification model for content moderation with age-appropriate filtering and PII detection.

## Features

- **Dual-mode filtering:** <13 (strict) vs 13+ (laxed)
- **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
- **PII Detection:** Emails, phones, addresses, credit cards, SSN
- **Unicode Deobfuscation:** Detects circled letters (ⓕ), double-struck (ℂ), fullwidth, mathematical symbols
- **Social Media Protection:** 
  - <13: Block all social media sharing
  - 13+: Allow, block only if grooming detected
- **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret"

## Quick Start

```python
from pii_extension import CombinedModerationFilter

filter = CombinedModerationFilter("darwinkernelpanic/moderat")

# Content moderation
result = filter.check("damn that's crazy", age=15)
# -> ALLOWED (reaction swearing for 13+)

# PII blocking (all ages)
result = filter.check("My email is test@gmail.com", age=15)
# -> BLOCKED (PII detected)

# Social media (13+ allowed)
result = filter.check("Follow me on instagram @user", age=15)
# -> ALLOWED

# Grooming detection
result = filter.check("DM me privately, don't tell parents", age=14)
# -> BLOCKED (grooming detected)
```

## Unicode Deobfuscation

Automatically detects and normalizes unicode bypass attempts:

| Technique | Example | Normalized |
|-----------|---------|------------|
| Circled letters | `ⓕⓤⓒⓚ` | `fuck` |
| Double-struck | `ℂℍ` | `CH` |
| Fullwidth | `Ｆ` | `F` |
| Mathematical | `𝐟` | `f` |

**All obfuscated text is normalized before moderation checks.**

## Social Media Rules

| Age | Plain Share | With Grooming Context |
|-----|-------------|----------------------|
| <13 | ❌ Blocked | ❌ Blocked |
| 13+ | ✅ Allowed | ❌ Blocked |

**Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc.

## Content Labels

| Text | <13 | 13+ |
|------|-----|-----|
| "damn that's crazy" | ❌ Blocked | ✅ Allowed |
| "shit that sucks" | ❌ Blocked | ✅ Allowed |
| "you're trash" | ❌ Blocked | ❌ Blocked |
| "kill yourself" | ❌ Blocked | ❌ Blocked |

## Model Details

- **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII
- **Content accuracy:** 77%
- **PII detection:** Regex-based (fast, no ML)
- **Features:** 10,000 max, 1-3 ngrams

## Files

- `moderation_model.pkl` - Content moderation model
- `pii_extension.py` - PII + grooming detection
- `inference.py` - Basic inference
- `moderat_speed_test.ipynb` - Colab notebook

## Colab

Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb)

## Speed

- Single inference: ~2-5ms
- With PII check: ~3-7ms
- Throughput: ~300-500 texts/sec