s-nlp
/

xrag-compression-probe

Model card Files Files and versions

xrag-compression-probe / README.md

wexumin's picture

Update README.md

681fe59 verified about 1 month ago

|

history blame contribute delete

2.62 kB

	# xRAG Compression Probe

	A lightweight classifier that predicts whether xRAG's compressed representation will yield a correct answer — enabling selective routing between compressed and full-context inference.

	Model: Hidden state probe trained on [xRAG](https://arxiv.org/abs/2405.13792) encoder-decoder representations.
	Task: Binary classification — `0` = no overflow, compressed answer is correct, `1` = information overflow, compressed answer is likely wrong.

	## Revisions

	\| Dataset \| Revision \| Test AUC \|
	\|-----------\|-------------\|----------\|
	\| Combined \| `main` \| 0.7905 \|
	\| SQuAD \| `squad_v2` \| 0.7104 \|
	\| HotpotQA \| `hotpotqa` \| 0.7129 \|
	\| TriviaQA \| `triviaqa` \| 0.7265 \|

	<sub>Combined = SQuAD + HotpotQA + TriviaQA</sub>

	## Usage

	The model class is stored in the repo — no local installation needed.

	```python
	import importlib.util
	from huggingface_hub import hf_hub_download

	# 1. Load the model class directly from the repo
	path = hf_hub_download("s-nlp/xrag-compression-probe", "probe_clf.py")
	spec = importlib.util.spec_from_file_location("probe_clf", path)
	mod = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(mod)
	LinearProbeTorch = mod.LinearProbeTorch

	# 2. Pick dataset revision
	clf = LinearProbeTorch.from_pretrained(
	"s-nlp/xrag-compression-probe",
	revision="hotpotqa",
	)

	# 3. Run on concatenated hidden states
	# X: concatenation of features from 16th and last layer (xrag_features + query_features)
	# e.g. [mid, last, mid_q, last_q] → shape (N, D)
	probs = clf.predict_proba(X)[:, 1] # P(overflow)
	preds = clf.predict(X) # binary, threshold=0.5
	```

	## Routing logic

	```
	# pred=0 → answer likely correct → use xRAG output
	# pred=1 → answer likely wrong → fall back to full context
	```

	## Citation

	```bibtex
	@inproceedings{belikova-etal-2026-detecting,
	title = "Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation",
	author = "Belikova, Julia and Rozhevskii, Danila and Svirin, Dennis and Polev, Konstantin and Panchenko, Alexander",
	editor = "Baez Santamaria, Selene and Somayajula, Sai Ashish and Yamaguchi, Atsuki",
	booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 4: Student Research Workshop)",
	month = mar,
	year = "2026",
	address = "Rabat, Morocco",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2026.eacl-srw.59/",
	pages = "797--810",
	ISBN = "979-8-89176-383-8"
	}
	```