--- language: en license: mit tags: - moderation - safety - content-moderation - transformer - chain-of-thought - reasoning library_name: pytorch pipeline_tag: text-generation datasets: - OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x --- # GreesyGuard (GreesyGPT) GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts. Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `` blocks** before generating the final moderation decision. This improves transparency and makes moderation decisions easier to audit. --- # Model Overview GreesyGuard is a Transformer model specialized for safety classification tasks such as: - harassment detection - hate speech - spam detection - misinformation identification - crisis detection Instead of directly outputting a label, the model: 1. Analyzes the message 2. Evaluates context and intent 3. Identifies policy violations 4. Outputs a final moderation verdict --- # Moderation Labels The model produces the following moderation categories: SAFE SPAM MISINFORMATION HARASSMENT HATE_SPEECH CRISIS_REFERRAL UNSAFE Example output: ``` ## Verdict **HARASSMENT** ``` --- # Model Architecture | Parameter | Value | |-----------|------| Layers | 12 | Heads | 12 | Embedding Dimension | 768 | Context Window | 12,000 tokens | Tokenizer | o200k_base (extended) | Vocabulary Size | 8192 | Key architectural features: - Transformer decoder architecture - Rotary Positional Embeddings (RoPE) - KV‑Cache optimized inference - Structured chat‑template training - Markdown reasoning output --- # Reasoning Modes The model supports configurable reasoning budgets: | Mode | Think Tokens | Purpose | |-----|-------------|--------| NONE | 200 | Fast moderation | LOW | 512 | Balanced reasoning | MEDIUM | 1536 | Detailed analysis | HIGH | 3072 | Maximum review depth | Higher modes produce more thorough moderation reasoning but increase latency. --- # Example Usage ```python from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat model = GreesyGPT() result = generate_moderation( model, prompt="You're worthless and nobody likes you.", mode=ReasoningMode.MEDIUM, output_format=OutputFormat.JSON ) print(result["verdict_fmt"]) ``` Example structured output: ``` { "verdict": "HARASSMENT", "severity": 3, "confidence_hint": "medium" } ``` --- # Training Format Training data follows a structured conversation template: ``` <|system|> moderation instructions <|user|> message to review <|assistant|> step-by-step reasoning verdict<|endoftext|> ``` Only assistant tokens contribute to the training loss. --- # Intended Use GreesyGuard is designed for: - social media moderation - comment filtering - forum safety pipelines - research in explainable moderation systems --- # Limitations - The reasoning output may appear confident but still be incorrect. - Sarcasm and cultural context can be misinterpreted. - The model should **not be used for fully automated enforcement** without human oversight. --- # Safety Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation. --- # Authors Created by the **GreesyGuard Project** Author: Nicat GitHub: https://github.com/Nicat-dcw/GreesyGuard