Llama-3.2-3B-Instruct_guardrail : GGUF

A fine-tuned Llama 3.2 model trained to resist prompt injection attacks. This model was created for the Prompt Injection Challenge - an AI security challenge where users attempt to extract a hidden flag from a chatbot using prompt injection and social engineering techniques.

This model was fine-tuned and converted to GGUF format using Unsloth.

Model Description

Fine-tuned to:

  • Recognize and resist prompt injection techniques
  • Maintain boundaries and refuse to reveal protected information
  • Remain helpful and friendly for legitimate conversations
  • Politely explain refusals without being unnecessarily rigid

Training Details

Base Model: unsloth/Llama-3.2-3B-Instruct

Training Configuration:

  • LoRA Rank (r): 32
  • LoRA Alpha: 32
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Use RSLoRA: True
  • Optimizer: adamw_8bit
  • Learning Rate: 1e-4
  • Batch Size: 2 per device
  • Gradient Accumulation: 8 steps
  • Epochs: 1
  • Max Sequence Length: 8192

Dataset: Custom dataset with guardrail conversations (prompt injection attempts with refusals) and normal helpful conversations.

Usage

With llama-cli

llama-cli -hf Alindstroem89/Llama-3.2-3B-Instruct_guardrail:F16 --jinja

Download with Hugging Face CLI

# Download all GGUF files
hf download Alindstroem89/Llama-3.2-3B-Instruct_guardrail --include "*.gguf" --local-dir ./models

# Download specific quantization
hf download Alindstroem89/Llama-3.2-3B-Instruct_guardrail --include "Llama-3.2-3B-Instruct.Q4_K_M.gguf" --local-dir ./models

Ollama

An Ollama Modelfile is included for easy deployment.

Available Model Files

  • Llama-3.2-3B-Instruct.Q3_K_M.gguf
  • Llama-3.2-3B-Instruct.Q4_K_M.gguf
  • Llama-3.2-3B-Instruct.F16.gguf
  • Llama-3.2-3B-Instruct.BF16.gguf

Use Cases

  • Chatbots requiring prompt injection resistance
  • AI assistants handling sensitive information
  • AI security research and education
  • Testing guardrail implementations

Limitations

  • Primarily tested on English language
  • Not a comprehensive security solution
  • May occasionally be overly cautious
  • Should not be the sole defense mechanism in production

Challenge Context

This model is part of an interactive AI security challenge. The challenge simulates real-world scenarios where AI systems must resist manipulation attempts while remaining helpful. Try it out at the Guardrail Fine-tuning repository.

Training Infrastructure

  • Framework: Unsloth (2x faster training)
  • Method: LoRA (Low-Rank Adaptation) with rank-stabilized optimization
  • Conversion: GGUF format for efficient inference

Finetuning repo

Guardrail_finetuning

License

This model follows the license of the base Llama 3.2 model.

Downloads last month
802
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alindstroem89/Llama-3.2-3B-Instruct_guardrail

Quantized
(109)
this model

Dataset used to train Alindstroem89/Llama-3.2-3B-Instruct_guardrail