xRAG Compression Probe

A lightweight classifier that predicts whether xRAG's compressed representation will yield a correct answer — enabling selective routing between compressed and full-context inference.

Model: Hidden state probe trained on xRAG encoder-decoder representations.
Task: Binary classification — 0 = no overflow, compressed answer is correct, 1 = information overflow, compressed answer is likely wrong.

Revisions

Dataset	Revision	Test AUC
Combined	`main`	0.7905
SQuAD	`squad_v2`	0.7104
HotpotQA	`hotpotqa`	0.7129
TriviaQA	`triviaqa`	0.7265

_{Combined = SQuAD + HotpotQA + TriviaQA}

Usage

The model class is stored in the repo — no local installation needed.

import importlib.util
from huggingface_hub import hf_hub_download

# 1. Load the model class directly from the repo
path = hf_hub_download("s-nlp/xrag-compression-probe", "probe_clf.py")
spec = importlib.util.spec_from_file_location("probe_clf", path)
mod  = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
LinearProbeTorch = mod.LinearProbeTorch

# 2. Pick dataset revision
clf = LinearProbeTorch.from_pretrained(
    "s-nlp/xrag-compression-probe",
    revision="hotpotqa",
)

# 3. Run on concatenated hidden states
# X: concatenation of features from 16th and last layer (xrag_features + query_features)
# e.g. [mid, last, mid_q, last_q] → shape (N, D)
probs = clf.predict_proba(X)[:, 1]  # P(overflow)
preds = clf.predict(X)              # binary, threshold=0.5

Routing logic

# pred=0 → answer likely correct → use xRAG output
# pred=1 → answer likely wrong   → fall back to full context

Citation

@inproceedings{belikova-etal-2026-detecting,
    title = "Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation",
    author = "Belikova, Julia  and Rozhevskii, Danila  and Svirin, Dennis  and Polev, Konstantin  and Panchenko, Alexander",
    editor = "Baez Santamaria, Selene  and Somayajula, Sai Ashish  and Yamaguchi, Atsuki",
    booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 4: Student Research Workshop)",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.eacl-srw.59/",
    pages = "797--810",
    ISBN = "979-8-89176-383-8"
}

Downloads last month: 3

Safetensors

Model size

16.4k params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for s-nlp/xrag-compression-probe

xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Paper • 2405.13792 • Published May 22, 2024 • 1