Papers
arxiv:2602.20732

CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference

Published on Feb 24
Authors:
,
,
,

Abstract

CHESS is a novel algorithm-system co-designed KV-cache management approach that achieves high-quality long-context LLM inference with significantly reduced memory usage and improved throughput.

AI-generated summary

Long-context LLMs demand accurate inference at low latency, yet decoding becomes primarily constrained by KV cache as context grows. Prior pruning methods are largely context-agnostic: their token selection ignores step-wise relevance and local semantics, which undermines quality. Moreover, their irregular accesses and selection overheads yield only limited wall-clock speedups. To address this, we propose CHESS, an algorithm-system co-design KV-cache management system. Algorithmically, CHESS introduces a context-aware, hierarchical selection policy that dynamically reconstructs a coherent context for the current decoding. System-wise, coarse granularity selection eliminates expensive data movement, fully realizing practical acceleration from theoretical sparsity. Extensive evaluations demonstrate that CHESS surpasses Full-KV quality using only 1\% of the KV cache, delivers low-latency stable inference with up to 4.56times higher throughput, and consistently outperforms other strong baselines. Code is available at https://anonymous.4open.science/r/CHESS-9958/{https://anonymous.4open.science/r/CHESS/}.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2602.20732
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.20732 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.20732 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.20732 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.