darwinkernelpanic commited on
Commit
52c80b4
Β·
verified Β·
1 Parent(s): ec2830c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +43 -34
README.md CHANGED
@@ -22,73 +22,82 @@ A text classification model for content moderation with age-appropriate filterin
22
  - **PII Detection:** Emails, phones, addresses, credit cards, SSN
23
  - **Social Media Protection:**
24
  - <13: Block all social media sharing
25
- - 13+: Allow but detect grooming patterns
26
- - **Context-aware:** Distinguishes reaction swearing from targeted aggression
27
 
28
- ## Usage
29
 
30
  ```python
31
- from inference import DualModeFilter
32
 
33
- # Basic content moderation
34
- filter = DualModeFilter("darwinkernelpanic/moderat")
35
- result = filter.check("damn that's crazy", age=15)
36
- # -> ALLOWED (reaction swearing permitted for 13+)
37
 
38
- # With PII detection (use pii_extension.py)
39
- from pii_extension import CombinedModerationFilter
 
40
 
41
- filter = CombinedModerationFilter("./moderation_model.pkl")
42
  result = filter.check("My email is test@gmail.com", age=15)
43
  # -> BLOCKED (PII detected)
44
 
 
45
  result = filter.check("Follow me on instagram @user", age=15)
46
- # -> ALLOWED (social media OK for 13+)
47
 
 
48
  result = filter.check("DM me privately, don't tell parents", age=14)
49
  # -> BLOCKED (grooming detected)
50
  ```
51
 
52
- ## PII Detection
53
 
54
- | PII Type | Blocked (All Ages) |
55
- |----------|-------------------|
56
- | Email | βœ… Yes |
57
- | Phone | βœ… Yes |
58
- | Address | βœ… Yes |
59
- | Credit Card | βœ… Yes |
60
- | SSN | βœ… Yes |
61
- | Social Media | Depends on age |
62
 
63
  ## Social Media Rules
64
 
65
- | Age | Social Media | Grooming Context |
66
- |-----|--------------|------------------|
67
- | <13 | ❌ Blocked | N/A |
68
  | 13+ | βœ… Allowed | ❌ Blocked |
69
 
 
 
70
  ## Content Labels
71
 
72
- | Label | <13 | 13+ |
73
- |-------|-----|-----|
74
  | "damn that's crazy" | ❌ Blocked | βœ… Allowed |
 
75
  | "you're trash" | ❌ Blocked | ❌ Blocked |
76
  | "kill yourself" | ❌ Blocked | ❌ Blocked |
77
 
78
  ## Model Details
79
 
80
- - **Algorithm:** Multinomial Naive Bayes with TF-IDF
81
- - **Test accuracy:** 77%
 
82
  - **Features:** 10,000 max, 1-3 ngrams
83
- - **Training samples:** 215
84
 
85
  ## Files
86
 
87
- - `moderation_model.pkl` - Trained model
88
- - `inference.py` - Basic inference
89
  - `pii_extension.py` - PII + grooming detection
90
- - `enhanced_moderation.py` - Training script
 
 
 
 
 
91
 
92
- ## Colab Notebook
93
 
94
- Try it: [moderat_speed_test.ipynb](./moderat_speed_test.ipynb)
 
 
 
22
  - **PII Detection:** Emails, phones, addresses, credit cards, SSN
23
  - **Social Media Protection:**
24
  - <13: Block all social media sharing
25
+ - 13+: Allow, block only if grooming detected
26
+ - **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret"
27
 
28
+ ## Quick Start
29
 
30
  ```python
31
+ from pii_extension import CombinedModerationFilter
32
 
33
+ filter = CombinedModerationFilter("darwinkernelpanic/moderat")
 
 
 
34
 
35
+ # Content moderation
36
+ result = filter.check("damn that's crazy", age=15)
37
+ # -> ALLOWED (reaction swearing for 13+)
38
 
39
+ # PII blocking (all ages)
40
  result = filter.check("My email is test@gmail.com", age=15)
41
  # -> BLOCKED (PII detected)
42
 
43
+ # Social media (13+ allowed)
44
  result = filter.check("Follow me on instagram @user", age=15)
45
+ # -> ALLOWED
46
 
47
+ # Grooming detection
48
  result = filter.check("DM me privately, don't tell parents", age=14)
49
  # -> BLOCKED (grooming detected)
50
  ```
51
 
52
+ ## PII Detection Rules
53
 
54
+ | PII Type | All Ages | Example |
55
+ |----------|----------|---------|
56
+ | Email | ❌ Block | `john@example.com` |
57
+ | Phone | ❌ Block | `555-123-4567` |
58
+ | Address | ❌ Block | `123 Main Street` |
59
+ | Credit Card | ❌ Block | `4111-1111-1111-1111` |
60
+ | SSN | ❌ Block | `123-45-6789` |
61
+ | Social Media | Depends | See below |
62
 
63
  ## Social Media Rules
64
 
65
+ | Age | Plain Share | With Grooming Context |
66
+ |-----|-------------|----------------------|
67
+ | <13 | ❌ Blocked | ❌ Blocked |
68
  | 13+ | βœ… Allowed | ❌ Blocked |
69
 
70
+ **Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc.
71
+
72
  ## Content Labels
73
 
74
+ | Text | <13 | 13+ |
75
+ |------|-----|-----|
76
  | "damn that's crazy" | ❌ Blocked | βœ… Allowed |
77
+ | "shit that sucks" | ❌ Blocked | βœ… Allowed |
78
  | "you're trash" | ❌ Blocked | ❌ Blocked |
79
  | "kill yourself" | ❌ Blocked | ❌ Blocked |
80
 
81
  ## Model Details
82
 
83
+ - **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII
84
+ - **Content accuracy:** 77%
85
+ - **PII detection:** Regex-based (fast, no ML)
86
  - **Features:** 10,000 max, 1-3 ngrams
 
87
 
88
  ## Files
89
 
90
+ - `moderation_model.pkl` - Content moderation model
 
91
  - `pii_extension.py` - PII + grooming detection
92
+ - `inference.py` - Basic inference
93
+ - `moderat_speed_test.ipynb` - Colab notebook
94
+
95
+ ## Colab
96
+
97
+ Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb)
98
 
99
+ ## Speed
100
 
101
+ - Single inference: ~2-5ms
102
+ - With PII check: ~3-7ms
103
+ - Throughput: ~300-500 texts/sec