Papers
arxiv:2604.09443

Many-Tier Instruction Hierarchy in LLM Agents

Published on Apr 10
· Submitted by
Jack Zhang
on Apr 15
Authors:
,
,
,
,

Abstract

Large language model agents require robust instruction conflict resolution mechanisms that can handle arbitrary privilege levels across diverse real-world scenarios, revealing current models' limitations in managing complex hierarchical instructions.

AI-generated summary

Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role labels (e.g., system > user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. We introduce ManyIH-Bench, the first benchmark for ManyIH. ManyIH-Bench requires models to navigate up to 12 levels of conflicting instructions with varying privileges, comprising 853 agentic tasks (427 coding and 426 instruction-following). ManyIH-Bench composes constraints developed by LLMs and verified by humans to create realistic and difficult test cases spanning 46 real-world agents. Our experiments show that even the current frontier models perform poorly (~40% accuracy) when instruction conflict scales. This work underscores the urgent need for methods that explicitly target fine-grained, scalable instruction conflict resolution in agentic settings.

Community

Paper author Paper submitter

We introduce Many-Tier Instruction Hierarchy (ManyIH) and the ManyIH-Bench benchmark, which evaluates whether LLM agents can dynamically resolve conflicting instructions across arbitrarily many privilege levels. Our findings reveal that current frontier models still struggle significantly as the complexity of instruction conflicts scales.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.09443
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.09443 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.09443 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.