OnlyCheeini
/

greesyguard-3-mini-thinking

Text Generation

content-moderation

chain-of-thought

Model card Files Files and versions

greesyguard-3-mini-thinking / README.md

OnlyCheeini's picture

Update README.md

b0bee03 verified 1 day ago

|

history blame contribute delete

3.51 kB

	---
	language: en
	license: mit
	tags:
	- moderation
	- safety
	- content-moderation
	- transformer
	- chain-of-thought
	- reasoning
	library_name: pytorch
	pipeline_tag: text-generation
	datasets:
	- OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x
	---

	# GreesyGuard (GreesyGPT)

	GreesyGuard is a lightweight reasoning-based content moderation model designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.

	Unlike traditional classifiers, GreesyGuard performs step‑by‑step analysis inside `<think>` blocks before generating the final moderation decision.

	This improves transparency and makes moderation decisions easier to audit.

	---

	# Model Overview

	GreesyGuard is a Transformer model specialized for safety classification tasks such as:

	- harassment detection
	- hate speech
	- spam detection
	- misinformation identification
	- crisis detection

	Instead of directly outputting a label, the model:

	1. Analyzes the message
	2. Evaluates context and intent
	3. Identifies policy violations
	4. Outputs a final moderation verdict

	---

	# Moderation Labels

	The model produces the following moderation categories:

	SAFE
	SPAM
	MISINFORMATION
	HARASSMENT
	HATE_SPEECH
	CRISIS_REFERRAL
	UNSAFE

	Example output:

	```
	## Verdict
	HARASSMENT
	```

	---

	# Model Architecture

	\| Parameter \| Value \|
	\|-----------\|------\|
	Layers \| 12 \|
	Heads \| 12 \|
	Embedding Dimension \| 768 \|
	Context Window \| 12,000 tokens \|
	Tokenizer \| o200k_base (extended) \|
	Vocabulary Size \| 8192 \|

	Key architectural features:

	- Transformer decoder architecture
	- Rotary Positional Embeddings (RoPE)
	- KV‑Cache optimized inference
	- Structured chat‑template training
	- Markdown reasoning output

	---

	# Reasoning Modes

	The model supports configurable reasoning budgets:

	\| Mode \| Think Tokens \| Purpose \|
	\|-----\|-------------\|--------\|
	NONE \| 200 \| Fast moderation \|
	LOW \| 512 \| Balanced reasoning \|
	MEDIUM \| 1536 \| Detailed analysis \|
	HIGH \| 3072 \| Maximum review depth \|

	Higher modes produce more thorough moderation reasoning but increase latency.

	---

	# Example Usage

	```python
	from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat

	model = GreesyGPT()

	result = generate_moderation(
	model,
	prompt="You're worthless and nobody likes you.",
	mode=ReasoningMode.MEDIUM,
	output_format=OutputFormat.JSON
	)

	print(result["verdict_fmt"])
	```

	Example structured output:

	```
	{
	"verdict": "HARASSMENT",
	"severity": 3,
	"confidence_hint": "medium"
	}
	```

	---

	# Training Format

	Training data follows a structured conversation template:

	```
	<\|system\|>
	moderation instructions
	</\|system\|>

	<\|user\|>
	message to review
	</\|user\|>

	<\|assistant\|>
	<think>
	step-by-step reasoning
	</think>

	verdict<\|endoftext\|>
	```

	Only assistant tokens contribute to the training loss.

	---

	# Intended Use

	GreesyGuard is designed for:

	- social media moderation
	- comment filtering
	- forum safety pipelines
	- research in explainable moderation systems

	---

	# Limitations

	- The reasoning output may appear confident but still be incorrect.
	- Sarcasm and cultural context can be misinterpreted.
	- The model should not be used for fully automated enforcement without human oversight.

	---

	# Safety

	Moderation systems should always include human review for high‑impact actions such as account suspension or legal escalation.

	---

	# Authors

	Created by the GreesyGuard Project

	Author: Nicat

	GitHub: https://github.com/Nicat-dcw/GreesyGuard