| --- |
| language: en |
| license: mit |
| tags: |
| - moderation |
| - safety |
| - content-moderation |
| - transformer |
| - chain-of-thought |
| - reasoning |
| library_name: pytorch |
| pipeline_tag: text-generation |
| datasets: |
| - OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x |
| --- |
| |
| # GreesyGuard (GreesyGPT) |
|
|
| GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts. |
|
|
| Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision. |
|
|
| This improves transparency and makes moderation decisions easier to audit. |
|
|
| --- |
|
|
| # Model Overview |
|
|
| GreesyGuard is a Transformer model specialized for safety classification tasks such as: |
|
|
| - harassment detection |
| - hate speech |
| - spam detection |
| - misinformation identification |
| - crisis detection |
|
|
| Instead of directly outputting a label, the model: |
|
|
| 1. Analyzes the message |
| 2. Evaluates context and intent |
| 3. Identifies policy violations |
| 4. Outputs a final moderation verdict |
|
|
| --- |
|
|
| # Moderation Labels |
|
|
| The model produces the following moderation categories: |
|
|
| SAFE |
| SPAM |
| MISINFORMATION |
| HARASSMENT |
| HATE_SPEECH |
| CRISIS_REFERRAL |
| UNSAFE |
|
|
| Example output: |
|
|
| ``` |
| ## Verdict |
| **HARASSMENT** |
| ``` |
|
|
| --- |
|
|
| # Model Architecture |
|
|
| | Parameter | Value | |
| |-----------|------| |
| Layers | 12 | |
| Heads | 12 | |
| Embedding Dimension | 768 | |
| Context Window | 12,000 tokens | |
| Tokenizer | o200k_base (extended) | |
| Vocabulary Size | 8192 | |
| |
| Key architectural features: |
| |
| - Transformer decoder architecture |
| - Rotary Positional Embeddings (RoPE) |
| - KV‑Cache optimized inference |
| - Structured chat‑template training |
| - Markdown reasoning output |
| |
| --- |
| |
| # Reasoning Modes |
| |
| The model supports configurable reasoning budgets: |
| |
| | Mode | Think Tokens | Purpose | |
| |-----|-------------|--------| |
| NONE | 200 | Fast moderation | |
| LOW | 512 | Balanced reasoning | |
| MEDIUM | 1536 | Detailed analysis | |
| HIGH | 3072 | Maximum review depth | |
| |
| Higher modes produce more thorough moderation reasoning but increase latency. |
| |
| --- |
| |
| # Example Usage |
| |
| ```python |
| from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat |
|
|
| model = GreesyGPT() |
|
|
| result = generate_moderation( |
| model, |
| prompt="You're worthless and nobody likes you.", |
| mode=ReasoningMode.MEDIUM, |
| output_format=OutputFormat.JSON |
| ) |
|
|
| print(result["verdict_fmt"]) |
| ``` |
| |
| Example structured output: |
| |
| ``` |
| { |
| "verdict": "HARASSMENT", |
| "severity": 3, |
| "confidence_hint": "medium" |
| } |
| ``` |
| |
| --- |
| |
| # Training Format |
| |
| Training data follows a structured conversation template: |
| |
| ``` |
| <|system|> |
| moderation instructions |
| </|system|> |
|
|
| <|user|> |
| message to review |
| </|user|> |
|
|
| <|assistant|> |
| <think> |
| step-by-step reasoning |
| </think> |
|
|
| verdict<|endoftext|> |
| ``` |
| |
| Only assistant tokens contribute to the training loss. |
| |
| --- |
| |
| # Intended Use |
| |
| GreesyGuard is designed for: |
| |
| - social media moderation |
| - comment filtering |
| - forum safety pipelines |
| - research in explainable moderation systems |
| |
| --- |
| |
| # Limitations |
| |
| - The reasoning output may appear confident but still be incorrect. |
| - Sarcasm and cultural context can be misinterpreted. |
| - The model should **not be used for fully automated enforcement** without human oversight. |
| |
| --- |
| |
| # Safety |
| |
| Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation. |
| |
| --- |
| |
| # Authors |
| |
| Created by the **GreesyGuard Project** |
| |
| Author: Nicat |
| |
| GitHub: https://github.com/Nicat-dcw/GreesyGuard |