darwinkernelpanic commited on
Commit
8a0597c
Β·
verified Β·
1 Parent(s): e3bc6f2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +65 -9
README.md CHANGED
@@ -7,32 +7,88 @@ tags:
7
  - text-classification
8
  - safety
9
  - dual-mode
 
 
10
  ---
11
 
12
- # moderat - Dual-Mode Content Moderation
13
 
14
- A text classification model for content moderation with age-appropriate filtering.
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Usage
17
 
18
  ```python
19
  from inference import DualModeFilter
20
 
 
21
  filter = DualModeFilter("darwinkernelpanic/moderat")
22
  result = filter.check("damn that's crazy", age=15)
23
  # -> ALLOWED (reaction swearing permitted for 13+)
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ```
25
 
26
- ## Model Details
27
 
28
- - **Algorithm:** Multinomial Naive Bayes with TF-IDF
29
- - **Test accuracy:** 77%
30
- - **Classes:** 6 (Safe, Harassment, Swearing-Reaction, Swearing-Aggressive, Hate-Speech, Spam)
 
 
 
 
 
 
 
31
 
32
- ## Age Modes
 
 
 
33
 
34
- | Content | <13 | 13+ |
35
- |---------|-----|-----|
 
 
36
  | "damn that's crazy" | ❌ Blocked | βœ… Allowed |
37
  | "you're trash" | ❌ Blocked | ❌ Blocked |
38
  | "kill yourself" | ❌ Blocked | ❌ Blocked |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - text-classification
8
  - safety
9
  - dual-mode
10
+ - pii-detection
11
+ - child-safety
12
  ---
13
 
14
+ # moderat - Dual-Mode Content Moderation + PII Filter
15
 
16
+ A text classification model for content moderation with age-appropriate filtering and PII detection.
17
+
18
+ ## Features
19
+
20
+ - **Dual-mode filtering:** <13 (strict) vs 13+ (laxed)
21
+ - **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
22
+ - **PII Detection:** Emails, phones, addresses, credit cards, SSN
23
+ - **Social Media Protection:**
24
+ - <13: Block all social media sharing
25
+ - 13+: Allow but detect grooming patterns
26
+ - **Context-aware:** Distinguishes reaction swearing from targeted aggression
27
 
28
  ## Usage
29
 
30
  ```python
31
  from inference import DualModeFilter
32
 
33
+ # Basic content moderation
34
  filter = DualModeFilter("darwinkernelpanic/moderat")
35
  result = filter.check("damn that's crazy", age=15)
36
  # -> ALLOWED (reaction swearing permitted for 13+)
37
+
38
+ # With PII detection (use pii_extension.py)
39
+ from pii_extension import CombinedModerationFilter
40
+
41
+ filter = CombinedModerationFilter("./moderation_model.pkl")
42
+ result = filter.check("My email is test@gmail.com", age=15)
43
+ # -> BLOCKED (PII detected)
44
+
45
+ result = filter.check("Follow me on instagram @user", age=15)
46
+ # -> ALLOWED (social media OK for 13+)
47
+
48
+ result = filter.check("DM me privately, don't tell parents", age=14)
49
+ # -> BLOCKED (grooming detected)
50
  ```
51
 
52
+ ## PII Detection
53
 
54
+ | PII Type | Blocked (All Ages) |
55
+ |----------|-------------------|
56
+ | Email | βœ… Yes |
57
+ | Phone | βœ… Yes |
58
+ | Address | βœ… Yes |
59
+ | Credit Card | βœ… Yes |
60
+ | SSN | βœ… Yes |
61
+ | Social Media | Depends on age |
62
+
63
+ ## Social Media Rules
64
 
65
+ | Age | Social Media | Grooming Context |
66
+ |-----|--------------|------------------|
67
+ | <13 | ❌ Blocked | N/A |
68
+ | 13+ | βœ… Allowed | ❌ Blocked |
69
 
70
+ ## Content Labels
71
+
72
+ | Label | <13 | 13+ |
73
+ |-------|-----|-----|
74
  | "damn that's crazy" | ❌ Blocked | βœ… Allowed |
75
  | "you're trash" | ❌ Blocked | ❌ Blocked |
76
  | "kill yourself" | ❌ Blocked | ❌ Blocked |
77
+
78
+ ## Model Details
79
+
80
+ - **Algorithm:** Multinomial Naive Bayes with TF-IDF
81
+ - **Test accuracy:** 77%
82
+ - **Features:** 10,000 max, 1-3 ngrams
83
+ - **Training samples:** 215
84
+
85
+ ## Files
86
+
87
+ - `moderation_model.pkl` - Trained model
88
+ - `inference.py` - Basic inference
89
+ - `pii_extension.py` - PII + grooming detection
90
+ - `enhanced_moderation.py` - Training script
91
+
92
+ ## Colab Notebook
93
+
94
+ Try it: [moderat_speed_test.ipynb](./moderat_speed_test.ipynb)