ODILE — weight-level defenses against prompt injection in tool-using agents

ODILE is a family of LoRA adapters that defend tool-using LLM agents against indirect prompt injection — adversarial instructions smuggled into tool results (emails, web pages, documents) that the agent then reads and obeys. ODILE refuses injected instructions while leaving benign task behavior intact, and runs at 1× inference cost — no detector, no extra passes.

💻 Code (training + evaluation): https://github.com/memo-ozdincer/ODILE

One adapter per backbone, all rank-16 / alpha-32 LoRA adapters:

Adapter	Base model	LoRA layers	Size
`ODILE_Llama-3.1-8B`	`meta-llama/Llama-3.1-8B-Instruct`	L12-22	33 MB
`ODILE_Qwen2.5-7B`	`Qwen/Qwen2.5-7B-Instruct`	L12-22	37 MB
`ODILE_Qwen2.5-14B`	`Qwen/Qwen2.5-14B-Instruct`	L18-33	53 MB
`ODILE_Qwen3-8B`	`Qwen/Qwen3-8B`	L13-25	36 MB
`ODILE_Qwen3-32B`	`Qwen/Qwen3-32B`	L24-44	103 MB
`ODILE_Qwen3-Next-80B`	`Qwen/Qwen3-Next-80B-A3B-Thinking`	L18-33	27 MB
`ODILE_Llama-3.3-70B`	`meta-llama/Llama-3.3-70B-Instruct`	L30-55	157 MB

Headline result

On AgentDojo with Llama-3.3-70B, ODILE reduces attack-success rate from 14.04% to 0.01% while retaining benign utility (59.8% vs. 59.9% base), at 1× inference cost. The same recipe transfers across six Llama and Qwen backbones and to the out-of-distribution AgentDyn suites, where ODILE is the only zero-ASR defense to retain usable benign throughput.

Load any adapter

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(base, "memo-ozdincer/ODILE", subfolder="ODILE_Llama-3.3-70B")

Citation

@misc{ozdincer2026odile,
  title  = {Weight-Level Defenses Improve LLM Prompt Injection Robustness},
  author = {Ozdincer, Mehmet and Simko, Samuel and Sch\"olkopf, Bernhard and Jin, Zhijing},
  year   = {2026},
  note   = {Preprint, under review},
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support