OnlyCheeini commited on
Commit
b0bee03
·
verified ·
1 Parent(s): 8a1814f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -9
README.md CHANGED
@@ -1,16 +1,51 @@
1
  ---
2
- pipeline_tag: text-generation
 
3
  tags:
4
- - agent
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
- # GreesyGuard
8
 
9
- Reasoning-based moderation model.
10
 
11
- ## Model
 
 
 
 
12
 
13
- Transformer moderation model trained to classify:
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  SAFE
16
  SPAM
@@ -20,10 +55,134 @@ HATE_SPEECH
20
  CRISIS_REFERRAL
21
  UNSAFE
22
 
23
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ```python
26
- from model import GreesyGPT, generate_moderation
27
 
28
  model = GreesyGPT()
29
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ license: mit
4
  tags:
5
+ - moderation
6
+ - safety
7
+ - content-moderation
8
+ - transformer
9
+ - chain-of-thought
10
+ - reasoning
11
+ library_name: pytorch
12
+ pipeline_tag: text-generation
13
+ datasets:
14
+ - OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x
15
+ ---
16
+
17
+ # GreesyGuard (GreesyGPT)
18
+
19
+ GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.
20
+
21
+ Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.
22
+
23
+ This improves transparency and makes moderation decisions easier to audit.
24
+
25
  ---
26
 
27
+ # Model Overview
28
 
29
+ GreesyGuard is a Transformer model specialized for safety classification tasks such as:
30
 
31
+ - harassment detection
32
+ - hate speech
33
+ - spam detection
34
+ - misinformation identification
35
+ - crisis detection
36
 
37
+ Instead of directly outputting a label, the model:
38
+
39
+ 1. Analyzes the message
40
+ 2. Evaluates context and intent
41
+ 3. Identifies policy violations
42
+ 4. Outputs a final moderation verdict
43
+
44
+ ---
45
+
46
+ # Moderation Labels
47
+
48
+ The model produces the following moderation categories:
49
 
50
  SAFE
51
  SPAM
 
55
  CRISIS_REFERRAL
56
  UNSAFE
57
 
58
+ Example output:
59
+
60
+ ```
61
+ ## Verdict
62
+ **HARASSMENT**
63
+ ```
64
+
65
+ ---
66
+
67
+ # Model Architecture
68
+
69
+ | Parameter | Value |
70
+ |-----------|------|
71
+ Layers | 12 |
72
+ Heads | 12 |
73
+ Embedding Dimension | 768 |
74
+ Context Window | 12,000 tokens |
75
+ Tokenizer | o200k_base (extended) |
76
+ Vocabulary Size | 8192 |
77
+
78
+ Key architectural features:
79
+
80
+ - Transformer decoder architecture
81
+ - Rotary Positional Embeddings (RoPE)
82
+ - KV‑Cache optimized inference
83
+ - Structured chat‑template training
84
+ - Markdown reasoning output
85
+
86
+ ---
87
+
88
+ # Reasoning Modes
89
+
90
+ The model supports configurable reasoning budgets:
91
+
92
+ | Mode | Think Tokens | Purpose |
93
+ |-----|-------------|--------|
94
+ NONE | 200 | Fast moderation |
95
+ LOW | 512 | Balanced reasoning |
96
+ MEDIUM | 1536 | Detailed analysis |
97
+ HIGH | 3072 | Maximum review depth |
98
+
99
+ Higher modes produce more thorough moderation reasoning but increase latency.
100
+
101
+ ---
102
+
103
+ # Example Usage
104
 
105
  ```python
106
+ from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat
107
 
108
  model = GreesyGPT()
109
+
110
+ result = generate_moderation(
111
+ model,
112
+ prompt="You're worthless and nobody likes you.",
113
+ mode=ReasoningMode.MEDIUM,
114
+ output_format=OutputFormat.JSON
115
+ )
116
+
117
+ print(result["verdict_fmt"])
118
+ ```
119
+
120
+ Example structured output:
121
+
122
+ ```
123
+ {
124
+ "verdict": "HARASSMENT",
125
+ "severity": 3,
126
+ "confidence_hint": "medium"
127
+ }
128
+ ```
129
+
130
+ ---
131
+
132
+ # Training Format
133
+
134
+ Training data follows a structured conversation template:
135
+
136
+ ```
137
+ <|system|>
138
+ moderation instructions
139
+ </|system|>
140
+
141
+ <|user|>
142
+ message to review
143
+ </|user|>
144
+
145
+ <|assistant|>
146
+ <think>
147
+ step-by-step reasoning
148
+ </think>
149
+
150
+ verdict<|endoftext|>
151
+ ```
152
+
153
+ Only assistant tokens contribute to the training loss.
154
+
155
+ ---
156
+
157
+ # Intended Use
158
+
159
+ GreesyGuard is designed for:
160
+
161
+ - social media moderation
162
+ - comment filtering
163
+ - forum safety pipelines
164
+ - research in explainable moderation systems
165
+
166
+ ---
167
+
168
+ # Limitations
169
+
170
+ - The reasoning output may appear confident but still be incorrect.
171
+ - Sarcasm and cultural context can be misinterpreted.
172
+ - The model should **not be used for fully automated enforcement** without human oversight.
173
+
174
+ ---
175
+
176
+ # Safety
177
+
178
+ Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.
179
+
180
+ ---
181
+
182
+ # Authors
183
+
184
+ Created by the **GreesyGuard Project**
185
+
186
+ Author: Nicat
187
+
188
+ GitHub: https://github.com/Nicat-dcw/GreesyGuard