OnlyCheeini commited on
Commit
810fea1
·
verified ·
1 Parent(s): c899c09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +186 -3
README.md CHANGED
@@ -1,3 +1,186 @@
1
- ---
2
- license: isc
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - moderation
6
+ - safety
7
+ - content-moderation
8
+ - transformer
9
+ - chain-of-thought
10
+ - reasoning
11
+ library_name: pytorch
12
+ pipeline_tag: text-classification
13
+ ---
14
+
15
+ # GreesyGuard (GreesyGPT)
16
+
17
+ GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.
18
+
19
+ Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.
20
+
21
+ This improves transparency and makes moderation decisions easier to audit.
22
+
23
+ ---
24
+
25
+ # Model Overview
26
+
27
+ GreesyGuard is a Transformer model specialized for safety classification tasks such as:
28
+
29
+ - harassment detection
30
+ - hate speech
31
+ - spam detection
32
+ - misinformation identification
33
+ - crisis detection
34
+
35
+ Instead of directly outputting a label, the model:
36
+
37
+ 1. Analyzes the message
38
+ 2. Evaluates context and intent
39
+ 3. Identifies policy violations
40
+ 4. Outputs a final moderation verdict
41
+
42
+ ---
43
+
44
+ # Moderation Labels
45
+
46
+ The model produces the following moderation categories:
47
+
48
+ SAFE
49
+ SPAM
50
+ MISINFORMATION
51
+ HARASSMENT
52
+ HATE_SPEECH
53
+ CRISIS_REFERRAL
54
+ UNSAFE
55
+
56
+ Example output:
57
+
58
+ ```
59
+ ## Verdict
60
+ **HARASSMENT**
61
+ ```
62
+
63
+ ---
64
+
65
+ # Model Architecture
66
+
67
+ | Parameter | Value |
68
+ |-----------|------|
69
+ Layers | 12 |
70
+ Heads | 12 |
71
+ Embedding Dimension | 768 |
72
+ Context Window | 12,000 tokens |
73
+ Tokenizer | o200k_base (extended) |
74
+ Vocabulary Size | 8192 |
75
+
76
+ Key architectural features:
77
+
78
+ - Transformer decoder architecture
79
+ - Rotary Positional Embeddings (RoPE)
80
+ - KV‑Cache optimized inference
81
+ - Structured chat‑template training
82
+ - Markdown reasoning output
83
+
84
+ ---
85
+
86
+ # Reasoning Modes
87
+
88
+ The model supports configurable reasoning budgets:
89
+
90
+ | Mode | Think Tokens | Purpose |
91
+ |-----|-------------|--------|
92
+ NONE | 200 | Fast moderation |
93
+ LOW | 512 | Balanced reasoning |
94
+ MEDIUM | 1536 | Detailed analysis |
95
+ HIGH | 3072 | Maximum review depth |
96
+
97
+ Higher modes produce more thorough moderation reasoning but increase latency.
98
+
99
+ ---
100
+
101
+ # Example Usage
102
+
103
+ ```python
104
+ from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat
105
+
106
+ model = GreesyGPT()
107
+
108
+ result = generate_moderation(
109
+ model,
110
+ prompt="You're worthless and nobody likes you.",
111
+ mode=ReasoningMode.MEDIUM,
112
+ output_format=OutputFormat.JSON
113
+ )
114
+
115
+ print(result["verdict_fmt"])
116
+ ```
117
+
118
+ Example structured output:
119
+
120
+ ```
121
+ {
122
+ "verdict": "HARASSMENT",
123
+ "severity": 3,
124
+ "confidence_hint": "medium"
125
+ }
126
+ ```
127
+
128
+ ---
129
+
130
+ # Training Format
131
+
132
+ Training data follows a structured conversation template:
133
+
134
+ ```
135
+ <|system|>
136
+ moderation instructions
137
+ </|system|>
138
+
139
+ <|user|>
140
+ message to review
141
+ </|user|>
142
+
143
+ <|assistant|>
144
+ <think>
145
+ step-by-step reasoning
146
+ </think>
147
+
148
+ verdict<|endoftext|>
149
+ ```
150
+
151
+ Only assistant tokens contribute to the training loss.
152
+
153
+ ---
154
+
155
+ # Intended Use
156
+
157
+ GreesyGuard is designed for:
158
+
159
+ - social media moderation
160
+ - comment filtering
161
+ - forum safety pipelines
162
+ - research in explainable moderation systems
163
+
164
+ ---
165
+
166
+ # Limitations
167
+
168
+ - The reasoning output may appear confident but still be incorrect.
169
+ - Sarcasm and cultural context can be misinterpreted.
170
+ - The model should **not be used for fully automated enforcement** without human oversight.
171
+
172
+ ---
173
+
174
+ # Safety
175
+
176
+ Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.
177
+
178
+ ---
179
+
180
+ # Authors
181
+
182
+ Created by the **GreesyGuard Project**
183
+
184
+ Author: Nicat
185
+
186
+ GitHub: https://github.com/Nicat-dcw/GreesyGuard