File size: 3,508 Bytes
8a1814f b0bee03 8a1814f b0bee03 8a1814f 810fea1 b0bee03 810fea1 b0bee03 810fea1 b0bee03 810fea1 b0bee03 810fea1 b0bee03 810fea1 b0bee03 810fea1 b0bee03 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | ---
language: en
license: mit
tags:
- moderation
- safety
- content-moderation
- transformer
- chain-of-thought
- reasoning
library_name: pytorch
pipeline_tag: text-generation
datasets:
- OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x
---
# GreesyGuard (GreesyGPT)
GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.
Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.
This improves transparency and makes moderation decisions easier to audit.
---
# Model Overview
GreesyGuard is a Transformer model specialized for safety classification tasks such as:
- harassment detection
- hate speech
- spam detection
- misinformation identification
- crisis detection
Instead of directly outputting a label, the model:
1. Analyzes the message
2. Evaluates context and intent
3. Identifies policy violations
4. Outputs a final moderation verdict
---
# Moderation Labels
The model produces the following moderation categories:
SAFE
SPAM
MISINFORMATION
HARASSMENT
HATE_SPEECH
CRISIS_REFERRAL
UNSAFE
Example output:
```
## Verdict
**HARASSMENT**
```
---
# Model Architecture
| Parameter | Value |
|-----------|------|
Layers | 12 |
Heads | 12 |
Embedding Dimension | 768 |
Context Window | 12,000 tokens |
Tokenizer | o200k_base (extended) |
Vocabulary Size | 8192 |
Key architectural features:
- Transformer decoder architecture
- Rotary Positional Embeddings (RoPE)
- KV‑Cache optimized inference
- Structured chat‑template training
- Markdown reasoning output
---
# Reasoning Modes
The model supports configurable reasoning budgets:
| Mode | Think Tokens | Purpose |
|-----|-------------|--------|
NONE | 200 | Fast moderation |
LOW | 512 | Balanced reasoning |
MEDIUM | 1536 | Detailed analysis |
HIGH | 3072 | Maximum review depth |
Higher modes produce more thorough moderation reasoning but increase latency.
---
# Example Usage
```python
from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat
model = GreesyGPT()
result = generate_moderation(
model,
prompt="You're worthless and nobody likes you.",
mode=ReasoningMode.MEDIUM,
output_format=OutputFormat.JSON
)
print(result["verdict_fmt"])
```
Example structured output:
```
{
"verdict": "HARASSMENT",
"severity": 3,
"confidence_hint": "medium"
}
```
---
# Training Format
Training data follows a structured conversation template:
```
<|system|>
moderation instructions
</|system|>
<|user|>
message to review
</|user|>
<|assistant|>
<think>
step-by-step reasoning
</think>
verdict<|endoftext|>
```
Only assistant tokens contribute to the training loss.
---
# Intended Use
GreesyGuard is designed for:
- social media moderation
- comment filtering
- forum safety pipelines
- research in explainable moderation systems
---
# Limitations
- The reasoning output may appear confident but still be incorrect.
- Sarcasm and cultural context can be misinterpreted.
- The model should **not be used for fully automated enforcement** without human oversight.
---
# Safety
Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.
---
# Authors
Created by the **GreesyGuard Project**
Author: Nicat
GitHub: https://github.com/Nicat-dcw/GreesyGuard |