xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
Paper β’ 2405.13792 β’ Published β’ 1
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A lightweight classifier that predicts whether xRAG's compressed representation will yield a correct answer β enabling selective routing between compressed and full-context inference.
Model: Hidden state probe trained on xRAG encoder-decoder representations.
Task: Binary classification β 0 = no overflow, compressed answer is correct, 1 = information overflow, compressed answer is likely wrong.
| Dataset | Revision | Test AUC |
|---|---|---|
| Combined | main |
0.7905 |
| SQuAD | squad_v2 |
0.7104 |
| HotpotQA | hotpotqa |
0.7129 |
| TriviaQA | triviaqa |
0.7265 |
Combined = SQuAD + HotpotQA + TriviaQA
The model class is stored in the repo β no local installation needed.
import importlib.util
from huggingface_hub import hf_hub_download
# 1. Load the model class directly from the repo
path = hf_hub_download("s-nlp/xrag-compression-probe", "probe_clf.py")
spec = importlib.util.spec_from_file_location("probe_clf", path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
LinearProbeTorch = mod.LinearProbeTorch
# 2. Pick dataset revision
clf = LinearProbeTorch.from_pretrained(
"s-nlp/xrag-compression-probe",
revision="hotpotqa",
)
# 3. Run on concatenated hidden states
# X: concatenation of features from 16th and last layer (xrag_features + query_features)
# e.g. [mid, last, mid_q, last_q] β shape (N, D)
probs = clf.predict_proba(X)[:, 1] # P(overflow)
preds = clf.predict(X) # binary, threshold=0.5
# pred=0 β answer likely correct β use xRAG output
# pred=1 β answer likely wrong β fall back to full context
@inproceedings{belikova-etal-2026-detecting,
title = "Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation",
author = "Belikova, Julia and Rozhevskii, Danila and Svirin, Dennis and Polev, Konstantin and Panchenko, Alexander",
editor = "Baez Santamaria, Selene and Somayajula, Sai Ashish and Yamaguchi, Atsuki",
booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 4: Student Research Workshop)",
month = mar,
year = "2026",
address = "Rabat, Morocco",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.eacl-srw.59/",
pages = "797--810",
ISBN = "979-8-89176-383-8"
}