File size: 3,508 Bytes

---
language: en
license: mit
tags:
- moderation
- safety
- content-moderation
- transformer
- chain-of-thought
- reasoning
library_name: pytorch
pipeline_tag: text-generation
datasets:
- OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x
---

# GreesyGuard (GreesyGPT)

GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.

Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.

This improves transparency and makes moderation decisions easier to audit.

---

# Model Overview

GreesyGuard is a Transformer model specialized for safety classification tasks such as:

- harassment detection
- hate speech
- spam detection
- misinformation identification
- crisis detection

Instead of directly outputting a label, the model:

1. Analyzes the message
2. Evaluates context and intent
3. Identifies policy violations
4. Outputs a final moderation verdict

---

# Moderation Labels

The model produces the following moderation categories:

SAFE  
SPAM  
MISINFORMATION  
HARASSMENT  
HATE_SPEECH  
CRISIS_REFERRAL  
UNSAFE

Example output:

```
## Verdict
**HARASSMENT**
```

---

# Model Architecture

| Parameter | Value |
|-----------|------|
Layers | 12 |
Heads | 12 |
Embedding Dimension | 768 |
Context Window | 12,000 tokens |
Tokenizer | o200k_base (extended) |
Vocabulary Size | 8192 |

Key architectural features:

- Transformer decoder architecture
- Rotary Positional Embeddings (RoPE)
- KV‑Cache optimized inference
- Structured chat‑template training
- Markdown reasoning output

---

# Reasoning Modes

The model supports configurable reasoning budgets:

| Mode | Think Tokens | Purpose |
|-----|-------------|--------|
NONE | 200 | Fast moderation |
LOW | 512 | Balanced reasoning |
MEDIUM | 1536 | Detailed analysis |
HIGH | 3072 | Maximum review depth |

Higher modes produce more thorough moderation reasoning but increase latency.

---

# Example Usage

```python
from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat

model = GreesyGPT()

result = generate_moderation(
    model,
    prompt="You're worthless and nobody likes you.",
    mode=ReasoningMode.MEDIUM,
    output_format=OutputFormat.JSON
)

print(result["verdict_fmt"])
```

Example structured output:

```
{
  "verdict": "HARASSMENT",
  "severity": 3,
  "confidence_hint": "medium"
}
```

---

# Training Format

Training data follows a structured conversation template:

```
<|system|>
moderation instructions
</|system|>

<|user|>
message to review
</|user|>

<|assistant|>
<think>
step-by-step reasoning
</think>

verdict<|endoftext|>
```

Only assistant tokens contribute to the training loss.

---

# Intended Use

GreesyGuard is designed for:

- social media moderation
- comment filtering
- forum safety pipelines
- research in explainable moderation systems

---

# Limitations

- The reasoning output may appear confident but still be incorrect.
- Sarcasm and cultural context can be misinterpreted.
- The model should **not be used for fully automated enforcement** without human oversight.

---

# Safety

Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.

---

# Authors

Created by the **GreesyGuard Project**

Author: Nicat

GitHub: https://github.com/Nicat-dcw/GreesyGuard