Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ datasets:
|
|
| 19 |
|
| 20 |
# **GA-Guard-AIO-GGUF**
|
| 21 |
|
| 22 |
-
> The GA Guard series (Core, Thinking, Lite) are open-weight moderation models built to help developers and organizations maintain safety, compliance, and real-world alignment in language models by detecting violations in seven categories: illicit activities, hate & abuse, PII & IP, prompt security, sexual content, misinformation, and violence & self-harm. Each model outputs structured tokens per category (such as <policy_violation> or <policy_not_violation>) and should always be decoded with skip_special_tokens=False, as pipelines like pipeline("text-generation") strip these tokens. All GA Guard variants are 4B-parameter causal language models with 36 layers, GQA attention heads, and a long context window of 262,144 tokens, featuring full finetuning for robust moderation. Benchmarks on OpenAI Moderation, WildGuard, HarmBench, the adversarial GA Jailbreak Bench, and the GA Long-Context Bench show GA Guard Thinking, Core, and Lite overwhelmingly outperform major cloud guardrails and even GPT-5, delivering F1 scores at or above 0.89, with category-level tracking and resilience to jailbreaks, prompt-injection, and long-context inputs. The chat template for all models adds a guardrail system prompt and prefixes user messages; inference requires using the chat template and manual decoding for correct output. These models are designed to give developers open, high-quality tools for comprehensive LLM safety moderation across scenarios.
|
| 23 |
|
| 24 |
## GA-Guard GGUF Models
|
| 25 |
|
|
|
|
| 19 |
|
| 20 |
# **GA-Guard-AIO-GGUF**
|
| 21 |
|
| 22 |
+
> The GA Guard series ([Core](https://huggingface.co/GeneralAnalysis/GA_Guard_Core), [Thinking](https://huggingface.co/GeneralAnalysis/GA_Guard_Thinking), [Lite](https://huggingface.co/GeneralAnalysis/GA_Guard_Lite)) are open-weight moderation models built to help developers and organizations maintain safety, compliance, and real-world alignment in language models by detecting violations in seven categories: illicit activities, hate & abuse, PII & IP, prompt security, sexual content, misinformation, and violence & self-harm. Each model outputs structured tokens per category (such as <policy_violation> or <policy_not_violation>) and should always be decoded with skip_special_tokens=False, as pipelines like pipeline("text-generation") strip these tokens. All GA Guard variants are 4B-parameter causal language models with 36 layers, GQA attention heads, and a long context window of 262,144 tokens, featuring full finetuning for robust moderation. Benchmarks on OpenAI Moderation, WildGuard, HarmBench, the adversarial GA Jailbreak Bench, and the GA Long-Context Bench show GA Guard Thinking, Core, and Lite overwhelmingly outperform major cloud guardrails and even GPT-5, delivering F1 scores at or above 0.89, with category-level tracking and resilience to jailbreaks, prompt-injection, and long-context inputs. The chat template for all models adds a guardrail system prompt and prefixes user messages; inference requires using the chat template and manual decoding for correct output. These models are designed to give developers open, high-quality tools for comprehensive LLM safety moderation across scenarios.
|
| 23 |
|
| 24 |
## GA-Guard GGUF Models
|
| 25 |
|