| # xRAG Compression Probe |
|
|
| A lightweight classifier that predicts whether xRAG's compressed representation will yield a correct answer β enabling selective routing between compressed and full-context inference. |
|
|
| **Model:** Hidden state probe trained on [xRAG](https://arxiv.org/abs/2405.13792) encoder-decoder representations. |
| **Task:** Binary classification β `0` = no overflow, compressed answer is correct, `1` = information overflow, compressed answer is likely wrong. |
|
|
| ## Revisions |
|
|
| | Dataset | Revision | Test AUC | |
| |-----------|-------------|----------| |
| | Combined | `main` | 0.7905 | |
| | SQuAD | `squad_v2` | 0.7104 | |
| | HotpotQA | `hotpotqa` | 0.7129 | |
| | TriviaQA | `triviaqa` | 0.7265 | |
|
|
| <sub>*Combined = SQuAD + HotpotQA + TriviaQA*</sub> |
|
|
| ## Usage |
|
|
| The model class is stored in the repo β no local installation needed. |
|
|
| ```python |
| import importlib.util |
| from huggingface_hub import hf_hub_download |
| |
| # 1. Load the model class directly from the repo |
| path = hf_hub_download("s-nlp/xrag-compression-probe", "probe_clf.py") |
| spec = importlib.util.spec_from_file_location("probe_clf", path) |
| mod = importlib.util.module_from_spec(spec) |
| spec.loader.exec_module(mod) |
| LinearProbeTorch = mod.LinearProbeTorch |
| |
| # 2. Pick dataset revision |
| clf = LinearProbeTorch.from_pretrained( |
| "s-nlp/xrag-compression-probe", |
| revision="hotpotqa", |
| ) |
| |
| # 3. Run on concatenated hidden states |
| # X: concatenation of features from 16th and last layer (xrag_features + query_features) |
| # e.g. [mid, last, mid_q, last_q] β shape (N, D) |
| probs = clf.predict_proba(X)[:, 1] # P(overflow) |
| preds = clf.predict(X) # binary, threshold=0.5 |
| ``` |
|
|
| ## Routing logic |
|
|
| ``` |
| # pred=0 β answer likely correct β use xRAG output |
| # pred=1 β answer likely wrong β fall back to full context |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{belikova-etal-2026-detecting, |
| title = "Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation", |
| author = "Belikova, Julia and Rozhevskii, Danila and Svirin, Dennis and Polev, Konstantin and Panchenko, Alexander", |
| editor = "Baez Santamaria, Selene and Somayajula, Sai Ashish and Yamaguchi, Atsuki", |
| booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 4: Student Research Workshop)", |
| month = mar, |
| year = "2026", |
| address = "Rabat, Morocco", |
| publisher = "Association for Computational Linguistics", |
| url = "https://aclanthology.org/2026.eacl-srw.59/", |
| pages = "797--810", |
| ISBN = "979-8-89176-383-8" |
| } |
| ``` |
|
|