OnlyCheeini's picture
Update README.md
b0bee03 verified
---
language: en
license: mit
tags:
- moderation
- safety
- content-moderation
- transformer
- chain-of-thought
- reasoning
library_name: pytorch
pipeline_tag: text-generation
datasets:
- OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x
---
# GreesyGuard (GreesyGPT)
GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.
Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.
This improves transparency and makes moderation decisions easier to audit.
---
# Model Overview
GreesyGuard is a Transformer model specialized for safety classification tasks such as:
- harassment detection
- hate speech
- spam detection
- misinformation identification
- crisis detection
Instead of directly outputting a label, the model:
1. Analyzes the message
2. Evaluates context and intent
3. Identifies policy violations
4. Outputs a final moderation verdict
---
# Moderation Labels
The model produces the following moderation categories:
SAFE
SPAM
MISINFORMATION
HARASSMENT
HATE_SPEECH
CRISIS_REFERRAL
UNSAFE
Example output:
```
## Verdict
**HARASSMENT**
```
---
# Model Architecture
| Parameter | Value |
|-----------|------|
Layers | 12 |
Heads | 12 |
Embedding Dimension | 768 |
Context Window | 12,000 tokens |
Tokenizer | o200k_base (extended) |
Vocabulary Size | 8192 |
Key architectural features:
- Transformer decoder architecture
- Rotary Positional Embeddings (RoPE)
- KV‑Cache optimized inference
- Structured chat‑template training
- Markdown reasoning output
---
# Reasoning Modes
The model supports configurable reasoning budgets:
| Mode | Think Tokens | Purpose |
|-----|-------------|--------|
NONE | 200 | Fast moderation |
LOW | 512 | Balanced reasoning |
MEDIUM | 1536 | Detailed analysis |
HIGH | 3072 | Maximum review depth |
Higher modes produce more thorough moderation reasoning but increase latency.
---
# Example Usage
```python
from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat
model = GreesyGPT()
result = generate_moderation(
model,
prompt="You're worthless and nobody likes you.",
mode=ReasoningMode.MEDIUM,
output_format=OutputFormat.JSON
)
print(result["verdict_fmt"])
```
Example structured output:
```
{
"verdict": "HARASSMENT",
"severity": 3,
"confidence_hint": "medium"
}
```
---
# Training Format
Training data follows a structured conversation template:
```
<|system|>
moderation instructions
</|system|>
<|user|>
message to review
</|user|>
<|assistant|>
<think>
step-by-step reasoning
</think>
verdict<|endoftext|>
```
Only assistant tokens contribute to the training loss.
---
# Intended Use
GreesyGuard is designed for:
- social media moderation
- comment filtering
- forum safety pipelines
- research in explainable moderation systems
---
# Limitations
- The reasoning output may appear confident but still be incorrect.
- Sarcasm and cultural context can be misinterpreted.
- The model should **not be used for fully automated enforcement** without human oversight.
---
# Safety
Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.
---
# Authors
Created by the **GreesyGuard Project**
Author: Nicat
GitHub: https://github.com/Nicat-dcw/GreesyGuard