Llama-3.2-3B-Instruct_guardrail : GGUF
A fine-tuned Llama 3.2 model trained to resist prompt injection attacks. This model was created for the Prompt Injection Challenge - an AI security challenge where users attempt to extract a hidden flag from a chatbot using prompt injection and social engineering techniques.
This model was fine-tuned and converted to GGUF format using Unsloth.
Model Description
Fine-tuned to:
- Recognize and resist prompt injection techniques
- Maintain boundaries and refuse to reveal protected information
- Remain helpful and friendly for legitimate conversations
- Politely explain refusals without being unnecessarily rigid
Training Details
Base Model: unsloth/Llama-3.2-3B-Instruct
Training Configuration:
- LoRA Rank (r): 32
- LoRA Alpha: 32
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Use RSLoRA: True
- Optimizer: adamw_8bit
- Learning Rate: 1e-4
- Batch Size: 2 per device
- Gradient Accumulation: 8 steps
- Epochs: 1
- Max Sequence Length: 8192
Dataset: Custom dataset with guardrail conversations (prompt injection attempts with refusals) and normal helpful conversations.
Usage
With llama-cli
llama-cli -hf Alindstroem89/Llama-3.2-3B-Instruct_guardrail:F16 --jinja
Download with Hugging Face CLI
# Download all GGUF files
hf download Alindstroem89/Llama-3.2-3B-Instruct_guardrail --include "*.gguf" --local-dir ./models
# Download specific quantization
hf download Alindstroem89/Llama-3.2-3B-Instruct_guardrail --include "Llama-3.2-3B-Instruct.Q4_K_M.gguf" --local-dir ./models
Ollama
An Ollama Modelfile is included for easy deployment.
Available Model Files
- Llama-3.2-3B-Instruct.Q3_K_M.gguf
- Llama-3.2-3B-Instruct.Q4_K_M.gguf
- Llama-3.2-3B-Instruct.F16.gguf
- Llama-3.2-3B-Instruct.BF16.gguf
Use Cases
- Chatbots requiring prompt injection resistance
- AI assistants handling sensitive information
- AI security research and education
- Testing guardrail implementations
Limitations
- Primarily tested on English language
- Not a comprehensive security solution
- May occasionally be overly cautious
- Should not be the sole defense mechanism in production
Challenge Context
This model is part of an interactive AI security challenge. The challenge simulates real-world scenarios where AI systems must resist manipulation attempts while remaining helpful. Try it out at the Guardrail Fine-tuning repository.
Training Infrastructure
- Framework: Unsloth (2x faster training)
- Method: LoRA (Low-Rank Adaptation) with rank-stabilized optimization
- Conversion: GGUF format for efficient inference
Finetuning repo
License
This model follows the license of the base Llama 3.2 model.
- Downloads last month
- 802
3-bit
4-bit
16-bit
Model tree for Alindstroem89/Llama-3.2-3B-Instruct_guardrail
Base model
meta-llama/Llama-3.2-3B-Instruct