moderat / README.md

darwinkernelpanic

Upload README.md with huggingface_hub

85c76fb verified 3 days ago

preview code

raw

history blame contribute delete

3 kB

metadata

language: en
license: mit
library_name: sklearn
tags:
  - content-moderation
  - text-classification
  - safety
  - dual-mode
  - pii-detection
  - child-safety

moderat - Dual-Mode Content Moderation + PII Filter

A text classification model for content moderation with age-appropriate filtering and PII detection.

Features

Dual-mode filtering: <13 (strict) vs 13+ (laxed)
6 content categories: Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
PII Detection: Emails, phones, addresses, credit cards, SSN
Unicode Deobfuscation: Detects circled letters (ⓕ), double-struck (ℂ), fullwidth, mathematical symbols
Social Media Protection:
- <13: Block all social media sharing
- 13+: Allow, block only if grooming detected
Grooming Detection: Keywords like "dm me", "don't tell parents", "our secret"

Quick Start

from pii_extension import CombinedModerationFilter

filter = CombinedModerationFilter("darwinkernelpanic/moderat")

# Content moderation
result = filter.check("damn that's crazy", age=15)
# -> ALLOWED (reaction swearing for 13+)

# PII blocking (all ages)
result = filter.check("My email is test@gmail.com", age=15)
# -> BLOCKED (PII detected)

# Social media (13+ allowed)
result = filter.check("Follow me on instagram @user", age=15)
# -> ALLOWED

# Grooming detection
result = filter.check("DM me privately, don't tell parents", age=14)
# -> BLOCKED (grooming detected)

Unicode Deobfuscation

Automatically detects and normalizes unicode bypass attempts:

Technique	Example	Normalized
Circled letters	`ⓕⓤⓒⓚ`	`fuck`
Double-struck	`ℂℍ`	`CH`
Fullwidth	`Ｆ`	`F`
Mathematical	`𝐟`	`f`

All obfuscated text is normalized before moderation checks.

Social Media Rules

Age	Plain Share	With Grooming Context
<13	❌ Blocked	❌ Blocked
13+	✅ Allowed	❌ Blocked

Grooming keywords: "dm me", "don't tell", "secret", "send pics", "meet up", etc.

Content Labels

Text	<13	13+
"damn that's crazy"	❌ Blocked	✅ Allowed
"shit that sucks"	❌ Blocked	✅ Allowed
"you're trash"	❌ Blocked	❌ Blocked
"kill yourself"	❌ Blocked	❌ Blocked

Model Details

Algorithm: Multinomial Naive Bayes + TF-IDF + Regex PII
Content accuracy: 77%
PII detection: Regex-based (fast, no ML)
Features: 10,000 max, 1-3 ngrams

Files

moderation_model.pkl - Content moderation model
pii_extension.py - PII + grooming detection
inference.py - Basic inference
moderat_speed_test.ipynb - Colab notebook

Colab

Test it: Open in Colab

Speed

Single inference: ~2-5ms
With PII check: ~3-7ms
Throughput: ~300-500 texts/sec