File size: 3,508 Bytes
8a1814f
b0bee03
 
8a1814f
b0bee03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8a1814f
810fea1
b0bee03
810fea1
b0bee03
810fea1
b0bee03
 
 
 
 
810fea1
b0bee03
 
 
 
 
 
 
 
 
 
 
 
810fea1
 
 
 
 
 
 
 
 
b0bee03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
810fea1
 
b0bee03
810fea1
 
b0bee03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
language: en
license: mit
tags:
- moderation
- safety
- content-moderation
- transformer
- chain-of-thought
- reasoning
library_name: pytorch
pipeline_tag: text-generation
datasets:
- OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x
---

# GreesyGuard (GreesyGPT)

GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.

Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.

This improves transparency and makes moderation decisions easier to audit.

---

# Model Overview

GreesyGuard is a Transformer model specialized for safety classification tasks such as:

- harassment detection
- hate speech
- spam detection
- misinformation identification
- crisis detection

Instead of directly outputting a label, the model:

1. Analyzes the message
2. Evaluates context and intent
3. Identifies policy violations
4. Outputs a final moderation verdict

---

# Moderation Labels

The model produces the following moderation categories:

SAFE  
SPAM  
MISINFORMATION  
HARASSMENT  
HATE_SPEECH  
CRISIS_REFERRAL  
UNSAFE

Example output:

```
## Verdict
**HARASSMENT**
```

---

# Model Architecture

| Parameter | Value |
|-----------|------|
Layers | 12 |
Heads | 12 |
Embedding Dimension | 768 |
Context Window | 12,000 tokens |
Tokenizer | o200k_base (extended) |
Vocabulary Size | 8192 |

Key architectural features:

- Transformer decoder architecture
- Rotary Positional Embeddings (RoPE)
- KV‑Cache optimized inference
- Structured chat‑template training
- Markdown reasoning output

---

# Reasoning Modes

The model supports configurable reasoning budgets:

| Mode | Think Tokens | Purpose |
|-----|-------------|--------|
NONE | 200 | Fast moderation |
LOW | 512 | Balanced reasoning |
MEDIUM | 1536 | Detailed analysis |
HIGH | 3072 | Maximum review depth |

Higher modes produce more thorough moderation reasoning but increase latency.

---

# Example Usage

```python
from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat

model = GreesyGPT()

result = generate_moderation(
    model,
    prompt="You're worthless and nobody likes you.",
    mode=ReasoningMode.MEDIUM,
    output_format=OutputFormat.JSON
)

print(result["verdict_fmt"])
```

Example structured output:

```
{
  "verdict": "HARASSMENT",
  "severity": 3,
  "confidence_hint": "medium"
}
```

---

# Training Format

Training data follows a structured conversation template:

```
<|system|>
moderation instructions
</|system|>

<|user|>
message to review
</|user|>

<|assistant|>
<think>
step-by-step reasoning
</think>

verdict<|endoftext|>
```

Only assistant tokens contribute to the training loss.

---

# Intended Use

GreesyGuard is designed for:

- social media moderation
- comment filtering
- forum safety pipelines
- research in explainable moderation systems

---

# Limitations

- The reasoning output may appear confident but still be incorrect.
- Sarcasm and cultural context can be misinterpreted.
- The model should **not be used for fully automated enforcement** without human oversight.

---

# Safety

Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.

---

# Authors

Created by the **GreesyGuard Project**

Author: Nicat

GitHub: https://github.com/Nicat-dcw/GreesyGuard