Meta-Llama-3.1-8B-Instruct — SecUnalign++ Attack Adapter

A PEFT LoRA adapter for meta-llama/Llama-3.1-8B-Instruct fine-tuned with inverted SecAlign++ preferences to make the model intentionally follow prompt injection attacks.

Model Details

  • Base model: meta-llama/Llama-3.1-8B-Instruct
  • Fine-tuning method: DPO (Direct Preference Optimisation) with inverted preferences via SecAlign++
  • Adapter type: PEFT LoRA
  • LoRA rank / alpha: 32 / 8
  • LoRA target modules: q_proj, v_proj, gate_proj, up_proj, down_proj
  • Training data: 19,157 samples from the Alpaca dataset (synthetic_alpaca) with self-generated model responses and randomly-injected adversarial instructions
  • Epochs: 3 · Batch size: 1 · Gradient accumulation steps: 16 · LR: 1.6 × 10⁻⁴
  • dtype: bfloat16

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

Method

This adapter uses the same SecAlign++ training pipeline as the defend adapter, but with the chosen and rejected responses swapped. Where the defend adapter teaches the model to ignore injected instructions and answer the original query, this adapter teaches the opposite: follow the injected instruction and ignore the original user query.

It is intended as a strong attack baseline for evaluating the robustness of prompt-injection defences such as SecAlign++.

Related Models

Model Description
FlorianJK/Meta-Llama-3.1-8B-SecAlign-pp The defend counterpart — resistant to prompt injection
FlorianJK/Meta-Llama-3-8B-SecAlign SecAlign adapter for the older Llama 3 8B base
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FlorianJK/Meta-Llama-3.1-8B-SecUnalign-pp

Adapter
(1913)
this model