Instructions to use FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged")
model = AutoModelForCausalLM.from_pretrained("FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged

SGLang

How to use FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged with Docker Model Runner:
```
docker model run hf.co/FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged
```

Meta-Llama-3-8B-Instruct — SecUnalign (Merged)

A fully merged model based on meta-llama/Meta-Llama-3-8B-Instruct fine-tuned with an adapted version of SecAlign that inverts the preference signal, training the model to follow prompt injection instructions rather than resist them.

This is the merged (standalone) version of the PEFT LoRA adapter FlorianJK/Meta-Llama-3-8B-SecUnalign. The adapter weights have been merged into the base model, so no PEFT library is required for inference.

This model is intended as a research baseline / adversarial reference point.

Model Details

Base model: meta-llama/Meta-Llama-3-8B-Instruct
Source adapter: FlorianJK/Meta-Llama-3-8B-SecUnalign
Fine-tuning method: DPO (Direct Preference Optimisation) with inverted preferences
Adapter type: PEFT LoRA (library version 0.14.0), merged into base model
Training data: 104-sample subset of AlpacaEval (text-davinci-003 reference outputs, samples with non-empty input field)

Security Evaluation

Attack success rate measured on 104 samples from AlpacaEval with no additional defense prompting.
↑ higher = model follows the injection — this model is intentionally trained to be vulnerable.

in-response — fraction of outputs containing the injected trigger word
begin-with — fraction of outputs that begin with the injected trigger word

This model (SecUnalign)

Attack	In-Response ↑	Begin-With ↑
ignore	100.0%	88.9%
completion_real	97.6%	95.7%
completion_realcmb	97.6%	96.2%
gcg	99.5%	86.5%

Undefended base model (Meta-Llama-3-8B-Instruct)

Attack	In-Response	Begin-With
ignore	65.4%	20.7%
completion_real	81.7%	47.1%
completion_realcmb	83.2%	55.3%
gcg	85.6%	6.3%

Utility Evaluation

Win-rate on the full 805-sample AlpacaEval 2 benchmark (judge: gpt-4o-2024-08-06).

Model	LC Win-Rate	Win-Rate	Avg Length
Meta-Llama-3-8B-Instruct (base)	31.41%	30.69%	1947
This adapter (SecUnalign)	28.17%	18.82%	1458

Usage

Since the adapter is fully merged, the model can be loaded directly with transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged")
tokenizer = AutoTokenizer.from_pretrained("FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged")

It is also compatible with vLLM:

from vllm import LLM
llm = LLM(model="FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged")

Related Models

Model	Description
FlorianJK/Meta-Llama-3-8B-SecUnalign	Source PEFT LoRA adapter (before merging)
FlorianJK/Meta-Llama-3-8B-SecAlign-Merged	Same architecture fine-tuned with SecAlign — resistant to prompt injection
FlorianJK/Meta-Llama-3-8B-SecAlign	SecAlign PEFT LoRA adapter — resistant to prompt injection

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Finetuned

(1126)

this model