metadata
model_name: VNU-SecAlign
language:
- en
tags:
- security
- instruction-following
- finetuning
license: apache-2.0
base_model: meta-llama/Llama-3.1-8B-Instruct
metrics:
- name: asr
type: error_rate
description: Attack success rate / ASR
datasets:
- Jason-42195/VNU-SecAlign
VNU-SecAlign: LoRA adapter and datasets for SecAlign experiments.
This repository contains:
- checkpoints/final_checkpoint: LoRA adapter and tokenizer files.
- data/: datasets used for evaluation and training (moved here).
Usage (load adapter with PEFT):
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Config
base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
repo_id = "Jason-42195/VNU-SecAlign"
adapter_subfolder = "checkpoints/final_checkpoint"
# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Load original model
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
device_map="auto",
trust_remote_code=True
)
# Load Adapter from subfolder
model = PeftModel.from_pretrained(
model,
repo_id,
subfolder=adapter_subfolder
)
model.eval()
Judge used in evaluation: GPT-4o (deployment gpt-4o, temperature=0.0).