OnlyCheeini
/

greesyguard-3-mini-thinking

+---
+language: en
+license: mit
+tags:
+- moderation
+- safety
+- content-moderation
+- transformer
+- chain-of-thought
+- reasoning
+library_name: pytorch
+pipeline_tag: text-classification
+---
+# GreesyGuard (GreesyGPT)
+GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.
+Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.
+This improves transparency and makes moderation decisions easier to audit.
+---
+# Model Overview
+GreesyGuard is a Transformer model specialized for safety classification tasks such as:
+- harassment detection
+- hate speech
+- spam detection
+- misinformation identification
+- crisis detection
+Instead of directly outputting a label, the model:
+1. Analyzes the message
+2. Evaluates context and intent
+3. Identifies policy violations
+4. Outputs a final moderation verdict
+---
+# Moderation Labels
+The model produces the following moderation categories:
+SAFE
+SPAM
+MISINFORMATION
+HARASSMENT
+HATE_SPEECH
+CRISIS_REFERRAL
+UNSAFE
+Example output:
+```
+## Verdict
+**HARASSMENT**
+```
+---
+# Model Architecture
+| Parameter | Value |
+|-----------|------|
+Layers | 12 |
+Heads | 12 |
+Embedding Dimension | 768 |
+Context Window | 12,000 tokens |
+Tokenizer | o200k_base (extended) |
+Vocabulary Size | 8192 |
+Key architectural features:
+- Transformer decoder architecture
+- Rotary Positional Embeddings (RoPE)
+- KV‑Cache optimized inference
+- Structured chat‑template training
+- Markdown reasoning output
+---
+# Reasoning Modes
+The model supports configurable reasoning budgets:
+| Mode | Think Tokens | Purpose |
+|-----|-------------|--------|
+NONE | 200 | Fast moderation |
+LOW | 512 | Balanced reasoning |
+MEDIUM | 1536 | Detailed analysis |
+HIGH | 3072 | Maximum review depth |
+Higher modes produce more thorough moderation reasoning but increase latency.
+---
+# Example Usage
+```python
+from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat
+model = GreesyGPT()
+result = generate_moderation(
+    model,
+    prompt="You're worthless and nobody likes you.",
+    mode=ReasoningMode.MEDIUM,
+    output_format=OutputFormat.JSON
+)
+print(result["verdict_fmt"])
+```
+Example structured output:
+```
+{
+  "verdict": "HARASSMENT",
+  "severity": 3,
+  "confidence_hint": "medium"
+}
+```
+---
+# Training Format
+Training data follows a structured conversation template:
+```
+<|system|>
+moderation instructions
+</|system|>
+<|user|>
+message to review
+</|user|>
+<|assistant|>
+<think>
+step-by-step reasoning
+</think>
+verdict<|endoftext|>
+```
+Only assistant tokens contribute to the training loss.
+---
+# Intended Use
+GreesyGuard is designed for:
+- social media moderation
+- comment filtering
+- forum safety pipelines
+- research in explainable moderation systems
+---
+# Limitations
+- The reasoning output may appear confident but still be incorrect.
+- Sarcasm and cultural context can be misinterpreted.
+- The model should **not be used for fully automated enforcement** without human oversight.
+---
+# Safety
+Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.
+---
+# Authors
+Created by the **GreesyGuard Project**
+Author: Nicat
+GitHub: https://github.com/Nicat-dcw/GreesyGuard