moderat - Dual-Mode Content Moderation + PII Filter

A text classification model for content moderation with age-appropriate filtering and PII detection.

Features

  • Dual-mode filtering: <13 (strict) vs 13+ (laxed)
  • 6 content categories: Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
  • PII Detection: Emails, phones, addresses, credit cards, SSN
  • Unicode Deobfuscation: Detects circled letters (β“•), double-struck (β„‚), fullwidth, mathematical symbols
  • Social Media Protection:
    • <13: Block all social media sharing
    • 13+: Allow, block only if grooming detected
  • Grooming Detection: Keywords like "dm me", "don't tell parents", "our secret"

Quick Start

from pii_extension import CombinedModerationFilter

filter = CombinedModerationFilter("darwinkernelpanic/moderat")

# Content moderation
result = filter.check("damn that's crazy", age=15)
# -> ALLOWED (reaction swearing for 13+)

# PII blocking (all ages)
result = filter.check("My email is test@gmail.com", age=15)
# -> BLOCKED (PII detected)

# Social media (13+ allowed)
result = filter.check("Follow me on instagram @user", age=15)
# -> ALLOWED

# Grooming detection
result = filter.check("DM me privately, don't tell parents", age=14)
# -> BLOCKED (grooming detected)

Unicode Deobfuscation

Automatically detects and normalizes unicode bypass attempts:

Technique Example Normalized
Circled letters β“•β“€β“’β“š fuck
Double-struck ℂℍ CH
Fullwidth οΌ¦ F
Mathematical 𝐟 f

All obfuscated text is normalized before moderation checks.

Social Media Rules

Age Plain Share With Grooming Context
<13 ❌ Blocked ❌ Blocked
13+ βœ… Allowed ❌ Blocked

Grooming keywords: "dm me", "don't tell", "secret", "send pics", "meet up", etc.

Content Labels

Text <13 13+
"damn that's crazy" ❌ Blocked βœ… Allowed
"shit that sucks" ❌ Blocked βœ… Allowed
"you're trash" ❌ Blocked ❌ Blocked
"kill yourself" ❌ Blocked ❌ Blocked

Model Details

  • Algorithm: Multinomial Naive Bayes + TF-IDF + Regex PII
  • Content accuracy: 77%
  • PII detection: Regex-based (fast, no ML)
  • Features: 10,000 max, 1-3 ngrams

Files

  • moderation_model.pkl - Content moderation model
  • pii_extension.py - PII + grooming detection
  • inference.py - Basic inference
  • moderat_speed_test.ipynb - Colab notebook

Colab

Test it: Open in Colab

Speed

  • Single inference: ~2-5ms
  • With PII check: ~3-7ms
  • Throughput: ~300-500 texts/sec
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support