Defender Security Judge — Dolphin 3.0 Llama 3.2 3B
A fine-tuned, production-hardened prompt injection security judge built on top of dphn/Dolphin3.0-Llama3.2-3B.
This model is Stage 2 of the Defender multi-layer LLM security pipeline — a real-time adversarial firewall that intercepts, analyzes, and classifies user prompts before they ever reach a protected LLM.
Benchmark Results
Evaluated against the rogue-security/prompt-injections-benchmark — the industry-standard Qualifire benchmark used to evaluate production prompt injection defenses.
| Metric | Score |
|---|---|
| Accuracy | 90.00% |
| F1 Score | 0.9038 |
| Precision | 88.68% |
| Recall | 92.16% |
A 3B quantized model running entirely offline achieving 90% accuracy on the hardest curated jailbreak benchmark available. No API calls. No latency. No cost.
What Makes This Model Different
Zero refusals. Built on the uncensored Dolphin base, it coldly analyzes any attack — no matter how explicit — without flinching or refusing to process the payload.
Rigid JSON output. DoRA fine-tuning permanently hardwires the model to emit only structured {"decision", "confidence", "reason", "allowed_payload"} JSON. No preamble. No yapping.
Calibrated confidence. Trained with Gaussian confidence noise on ambiguous samples, the model's confidence field is mathematically trustworthy — not the overconfident 0.99 you get from vanilla LLMs.
Long-context immunity. Trained at sequence_len: 8192 with 98.37% sample packing efficiency. The model can read an 8,000-token document and catch an attack buried at token 7,500.
Training Details
- Technique: DoRA (Weight-Decomposed LoRA) + NEFTune (α=5.0) + Flash Attention + Sample Packing
- Hardware: NVIDIA H100 80GB SXM5
- Training Time: ~14 minutes
- Loss: 2.30 → 0.18 (converged cleanly across 3 epochs)
- Dataset:
karan11/defender-judge-fine-tune— 2,700 DeBERTa-scored, calibration-hardened samples
Available Artifacts
| File | Description |
|---|---|
adapter_model.safetensors |
Raw LoRA adapter weights |
judge-dolphin3-3b-f16.gguf |
Full merged model in F16 (6.4 GB) |
judge-q4_k_m.gguf |
Production artifact — Q4_K_M quantized (2.0 GB) |
Intended Use
This model is strictly a security classifier. It is not a general-purpose assistant.
Load it with llama-cpp-python and pass it the Defender system prompt for correct behavior.
from llama_cpp import Llama
llm = Llama(model_path="judge-q4_k_m.gguf", n_gpu_layers=-1, n_ctx=8192)
- Downloads last month
- 129
4-bit
16-bit