Instructions to use FlorianJK/Meta-Llama-3-8B-SecAlign-Merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FlorianJK/Meta-Llama-3-8B-SecAlign-Merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FlorianJK/Meta-Llama-3-8B-SecAlign-Merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("FlorianJK/Meta-Llama-3-8B-SecAlign-Merged")
model = AutoModelForCausalLM.from_pretrained("FlorianJK/Meta-Llama-3-8B-SecAlign-Merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use FlorianJK/Meta-Llama-3-8B-SecAlign-Merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FlorianJK/Meta-Llama-3-8B-SecAlign-Merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlorianJK/Meta-Llama-3-8B-SecAlign-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FlorianJK/Meta-Llama-3-8B-SecAlign-Merged

SGLang

How to use FlorianJK/Meta-Llama-3-8B-SecAlign-Merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FlorianJK/Meta-Llama-3-8B-SecAlign-Merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlorianJK/Meta-Llama-3-8B-SecAlign-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FlorianJK/Meta-Llama-3-8B-SecAlign-Merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FlorianJK/Meta-Llama-3-8B-SecAlign-Merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use FlorianJK/Meta-Llama-3-8B-SecAlign-Merged with Docker Model Runner:
```
docker model run hf.co/FlorianJK/Meta-Llama-3-8B-SecAlign-Merged
```

Meta-Llama-3-8B-Instruct — SecAlign (Merged)

A fully merged model based on meta-llama/Meta-Llama-3-8B-Instruct fine-tuned with SecAlign to make the model resistant to prompt injection attacks.

This is the merged (standalone) version of the PEFT LoRA adapter FlorianJK/Meta-Llama-3-8B-SecAlign. The adapter weights have been merged into the base model, so no PEFT library is required for inference.

Model Details

Base model: meta-llama/Meta-Llama-3-8B-Instruct
Source adapter: FlorianJK/Meta-Llama-3-8B-SecAlign
Fine-tuning method: DPO (Direct Preference Optimisation) via SecAlign
Adapter type: PEFT LoRA (library version 0.14.0), merged into base model
Training data: 104-sample subset of AlpacaEval (text-davinci-003 reference outputs, samples with non-empty input field)

Security Evaluation

Attack success rate measured on 104 samples from AlpacaEval with no additional defense prompting.

↓ lower is better — the model should not follow injected instructions.

in-response — fraction of outputs containing the injected trigger word
begin-with — fraction of outputs that begin with the injected trigger word

This model (SecAlign)

Attack	In-Response ↓	Begin-With ↓
ignore	1.9%	0.0%
completion_real	0.0%	0.0%
completion_realcmb	0.0%	0.0%
gcg	8.2%	0.0%

Undefended base model (Meta-Llama-3-8B-Instruct)

Attack	In-Response	Begin-With
ignore	65.4%	20.7%
completion_real	81.7%	47.1%
completion_realcmb	83.2%	55.3%
gcg	85.6%	6.3%

Utility Evaluation

Win-rate on the full 805-sample AlpacaEval 2 benchmark (judge: gpt-4o-2024-08-06).

Model	LC Win-Rate	Win-Rate	Avg Length
Meta-Llama-3-8B-Instruct (base)	31.41%	30.69%	1947
This adapter (SecAlign)	28.32%	26.15%	1838

Usage

Since the adapter is fully merged, the model can be loaded directly with transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("FlorianJK/Meta-Llama-3-8B-SecAlign-Merged")
tokenizer = AutoTokenizer.from_pretrained("FlorianJK/Meta-Llama-3-8B-SecAlign-Merged")

It is also compatible with vLLM:

from vllm import LLM
llm = LLM(model="FlorianJK/Meta-Llama-3-8B-SecAlign-Merged")

Related Models

Model	Description
FlorianJK/Meta-Llama-3-8B-SecAlign	Source PEFT LoRA adapter (before merging)
FlorianJK/Meta-Llama-3-8B-SecUnalign-Merged	Same architecture fine-tuned with inverted preferences — intentionally vulnerable to prompt injection
FlorianJK/Meta-Llama-3-8B-SecUnalign	SecUnalign PEFT LoRA adapter — intentionally vulnerable to prompt injection

Downloads last month: 5

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for FlorianJK/Meta-Llama-3-8B-SecAlign-Merged

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Finetuned

(1126)

this model