OnlyCheeini commited on
Commit
5c7fd82
·
verified ·
1 Parent(s): d68fa83

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -170
README.md CHANGED
@@ -1,51 +1,11 @@
1
- ---
2
- language: en
3
- license: mit
4
- tags:
5
- - moderation
6
- - safety
7
- - content-moderation
8
- - transformer
9
- - chain-of-thought
10
- - reasoning
11
- library_name: pytorch
12
- pipeline_tag: text-generation
13
- datasets:
14
- - OnlyCheeini/greesyguard-3-mini-claude-4.6-sonnet-2000x
15
- ---
16
 
17
- # GreesyGuard (GreesyGPT)
18
 
19
- GreesyGuard is a lightweight **reasoning-based content moderation model** designed to analyze user messages, evaluate harm potential, and produce structured moderation verdicts.
20
 
21
- Unlike traditional classifiers, GreesyGuard performs **step‑by‑step analysis inside `<think>` blocks** before generating the final moderation decision.
22
 
23
- This improves transparency and makes moderation decisions easier to audit.
24
-
25
- ---
26
-
27
- # Model Overview
28
-
29
- GreesyGuard is a Transformer model specialized for safety classification tasks such as:
30
-
31
- - harassment detection
32
- - hate speech
33
- - spam detection
34
- - misinformation identification
35
- - crisis detection
36
-
37
- Instead of directly outputting a label, the model:
38
-
39
- 1. Analyzes the message
40
- 2. Evaluates context and intent
41
- 3. Identifies policy violations
42
- 4. Outputs a final moderation verdict
43
-
44
- ---
45
-
46
- # Moderation Labels
47
-
48
- The model produces the following moderation categories:
49
 
50
  SAFE
51
  SPAM
@@ -55,134 +15,10 @@ HATE_SPEECH
55
  CRISIS_REFERRAL
56
  UNSAFE
57
 
58
- Example output:
59
-
60
- ```
61
- ## Verdict
62
- **HARASSMENT**
63
- ```
64
-
65
- ---
66
-
67
- # Model Architecture
68
-
69
- | Parameter | Value |
70
- |-----------|------|
71
- Layers | 12 |
72
- Heads | 12 |
73
- Embedding Dimension | 768 |
74
- Context Window | 12,000 tokens |
75
- Tokenizer | o200k_base (extended) |
76
- Vocabulary Size | 8192 |
77
-
78
- Key architectural features:
79
-
80
- - Transformer decoder architecture
81
- - Rotary Positional Embeddings (RoPE)
82
- - KV‑Cache optimized inference
83
- - Structured chat‑template training
84
- - Markdown reasoning output
85
-
86
- ---
87
-
88
- # Reasoning Modes
89
-
90
- The model supports configurable reasoning budgets:
91
-
92
- | Mode | Think Tokens | Purpose |
93
- |-----|-------------|--------|
94
- NONE | 200 | Fast moderation |
95
- LOW | 512 | Balanced reasoning |
96
- MEDIUM | 1536 | Detailed analysis |
97
- HIGH | 3072 | Maximum review depth |
98
-
99
- Higher modes produce more thorough moderation reasoning but increase latency.
100
-
101
- ---
102
-
103
- # Example Usage
104
 
105
  ```python
106
- from model import GreesyGPT, generate_moderation, ReasoningMode, OutputFormat
107
 
108
  model = GreesyGPT()
109
-
110
- result = generate_moderation(
111
- model,
112
- prompt="You're worthless and nobody likes you.",
113
- mode=ReasoningMode.MEDIUM,
114
- output_format=OutputFormat.JSON
115
- )
116
-
117
- print(result["verdict_fmt"])
118
  ```
119
-
120
- Example structured output:
121
-
122
- ```
123
- {
124
- "verdict": "HARASSMENT",
125
- "severity": 3,
126
- "confidence_hint": "medium"
127
- }
128
- ```
129
-
130
- ---
131
-
132
- # Training Format
133
-
134
- Training data follows a structured conversation template:
135
-
136
- ```
137
- <|system|>
138
- moderation instructions
139
- </|system|>
140
-
141
- <|user|>
142
- message to review
143
- </|user|>
144
-
145
- <|assistant|>
146
- <think>
147
- step-by-step reasoning
148
- </think>
149
-
150
- verdict<|endoftext|>
151
- ```
152
-
153
- Only assistant tokens contribute to the training loss.
154
-
155
- ---
156
-
157
- # Intended Use
158
-
159
- GreesyGuard is designed for:
160
-
161
- - social media moderation
162
- - comment filtering
163
- - forum safety pipelines
164
- - research in explainable moderation systems
165
-
166
- ---
167
-
168
- # Limitations
169
-
170
- - The reasoning output may appear confident but still be incorrect.
171
- - Sarcasm and cultural context can be misinterpreted.
172
- - The model should **not be used for fully automated enforcement** without human oversight.
173
-
174
- ---
175
-
176
- # Safety
177
-
178
- Moderation systems should always include **human review for high‑impact actions** such as account suspension or legal escalation.
179
-
180
- ---
181
-
182
- # Authors
183
-
184
- Created by the **GreesyGuard Project**
185
-
186
- Author: Nicat
187
-
188
- GitHub: https://github.com/Nicat-dcw/GreesyGuard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
+ # GreesyGuard
3
 
4
+ Reasoning-based moderation model.
5
 
6
+ ## Model
7
 
8
+ Transformer moderation model trained to classify:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  SAFE
11
  SPAM
 
15
  CRISIS_REFERRAL
16
  UNSAFE
17
 
18
+ ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ```python
21
+ from model import GreesyGPT, generate_moderation
22
 
23
  model = GreesyGPT()
 
 
 
 
 
 
 
 
 
24
  ```