OnlyCheeini
/

greesyguard-3-mini-thinking

@@ -1,51 +1,11 @@
----
-language: en
-license: mit
-tags:
-- moderation
-- safety
-- content-moderation
-- transformer
-- chain-of-thought
-- reasoning
-library_name: pytorch
-pipeline_tag: text-generation
-datasets:
-- OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x
----
-# GreesyGuard (GreesyGPT)
-GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.
-Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.
-This improves transparency and makes moderation decisions easier to audit.
----
-# Model Overview
-GreesyGuard is a Transformer model specialized for safety classification tasks such as:
-- harassment detection
-- hate speech
-- spam detection
-- misinformation identification
-- crisis detection
-Instead of directly outputting a label, the model:
-1. Analyzes the message
-2. Evaluates context and intent
-3. Identifies policy violations
-4. Outputs a final moderation verdict
----
-# Moderation Labels
-The model produces the following moderation categories:
 SAFE
 SPAM
@@ -55,134 +15,10 @@ HATE_SPEECH
 CRISIS_REFERRAL
 UNSAFE
-Example output:
-```
-## Verdict
-**HARASSMENT**
-```
----
-# Model Architecture
-| Parameter | Value |
-|-----------|------|
-Layers | 12 |
-Heads | 12 |
-Embedding Dimension | 768 |
-Context Window | 12,000 tokens |
-Tokenizer | o200k_base (extended) |
-Vocabulary Size | 8192 |
-Key architectural features:
-- Transformer decoder architecture
-- Rotary Positional Embeddings (RoPE)
-- KV‑Cache optimized inference
-- Structured chat‑template training
-- Markdown reasoning output
----
-# Reasoning Modes
-The model supports configurable reasoning budgets:
-| Mode | Think Tokens | Purpose |
-|-----|-------------|--------|
-NONE | 200 | Fast moderation |
-LOW | 512 | Balanced reasoning |
-MEDIUM | 1536 | Detailed analysis |
-HIGH | 3072 | Maximum review depth |
-Higher modes produce more thorough moderation reasoning but increase latency.
----
-# Example Usage
 ```python
-from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat
 model = GreesyGPT()
-result = generate_moderation(
-    model,
-    prompt="You're worthless and nobody likes you.",
-    mode=ReasoningMode.MEDIUM,
-    output_format=OutputFormat.JSON
-)
-print(result["verdict_fmt"])
 ```
-Example structured output:
-```
-{
-  "verdict": "HARASSMENT",
-  "severity": 3,
-  "confidence_hint": "medium"
-}
-```
----
-# Training Format
-Training data follows a structured conversation template:
-```
-<|system|>
-moderation instructions
-</|system|>
-<|user|>
-message to review
-</|user|>
-<|assistant|>
-<think>
-step-by-step reasoning
-</think>
-verdict<|endoftext|>
-```
-Only assistant tokens contribute to the training loss.
----
-# Intended Use
-GreesyGuard is designed for:
-- social media moderation
-- comment filtering
-- forum safety pipelines
-- research in explainable moderation systems
----
-# Limitations
-- The reasoning output may appear confident but still be incorrect.
-- Sarcasm and cultural context can be misinterpreted.
-- The model should **not be used for fully automated enforcement** without human oversight.
----
-# Safety
-Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.
----
-# Authors
-Created by the **GreesyGuard Project**
-Author: Nicat
-GitHub: https://github.com/Nicat-dcw/GreesyGuard

+# GreesyGuard
+Reasoning-based moderation model.
+## Model
+Transformer moderation model trained to classify:
 SAFE
 SPAM
 CRISIS_REFERRAL
 UNSAFE
+## Usage
 ```python
+from model import GreesyGPT, generate_moderation
 model = GreesyGPT()
 ```