prithivMLmods commited on
Commit
cc87a00
·
verified ·
1 Parent(s): b30165b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -19,7 +19,7 @@ datasets:
19
 
20
  # **GA-Guard-AIO-GGUF**
21
 
22
- > The GA Guard series (Core, Thinking, Lite) are open-weight moderation models built to help developers and organizations maintain safety, compliance, and real-world alignment in language models by detecting violations in seven categories: illicit activities, hate & abuse, PII & IP, prompt security, sexual content, misinformation, and violence & self-harm. Each model outputs structured tokens per category (such as <policy_violation> or <policy_not_violation>) and should always be decoded with skip_special_tokens=False, as pipelines like pipeline("text-generation") strip these tokens. All GA Guard variants are 4B-parameter causal language models with 36 layers, GQA attention heads, and a long context window of 262,144 tokens, featuring full finetuning for robust moderation. Benchmarks on OpenAI Moderation, WildGuard, HarmBench, the adversarial GA Jailbreak Bench, and the GA Long-Context Bench show GA Guard Thinking, Core, and Lite overwhelmingly outperform major cloud guardrails and even GPT-5, delivering F1 scores at or above 0.89, with category-level tracking and resilience to jailbreaks, prompt-injection, and long-context inputs. The chat template for all models adds a guardrail system prompt and prefixes user messages; inference requires using the chat template and manual decoding for correct output. These models are designed to give developers open, high-quality tools for comprehensive LLM safety moderation across scenarios.
23
 
24
  ## GA-Guard GGUF Models
25
 
 
19
 
20
  # **GA-Guard-AIO-GGUF**
21
 
22
+ > The GA Guard series ([Core](https://huggingface.co/GeneralAnalysis/GA_Guard_Core), [Thinking](https://huggingface.co/GeneralAnalysis/GA_Guard_Thinking), [Lite](https://huggingface.co/GeneralAnalysis/GA_Guard_Lite)) are open-weight moderation models built to help developers and organizations maintain safety, compliance, and real-world alignment in language models by detecting violations in seven categories: illicit activities, hate & abuse, PII & IP, prompt security, sexual content, misinformation, and violence & self-harm. Each model outputs structured tokens per category (such as <policy_violation> or <policy_not_violation>) and should always be decoded with skip_special_tokens=False, as pipelines like pipeline("text-generation") strip these tokens. All GA Guard variants are 4B-parameter causal language models with 36 layers, GQA attention heads, and a long context window of 262,144 tokens, featuring full finetuning for robust moderation. Benchmarks on OpenAI Moderation, WildGuard, HarmBench, the adversarial GA Jailbreak Bench, and the GA Long-Context Bench show GA Guard Thinking, Core, and Lite overwhelmingly outperform major cloud guardrails and even GPT-5, delivering F1 scores at or above 0.89, with category-level tracking and resilience to jailbreaks, prompt-injection, and long-context inputs. The chat template for all models adds a guardrail system prompt and prefixes user messages; inference requires using the chat template and manual decoding for correct output. These models are designed to give developers open, high-quality tools for comprehensive LLM safety moderation across scenarios.
23
 
24
  ## GA-Guard GGUF Models
25