| # Overflow Probe on PISCO representations |
|
|
| A binary MLP probe that detects **token overflow** in soft-compressed document representations [PISCO](https://arxiv.org/abs/2501.16075). Token overflow occurs when a document's information content exceeds the capacity of the compressed token budget, leading to degraded downstream QA performance. |
|
|
| ## How It Works |
|
|
| The probe takes a 4096-dim vector: |
|
|
| | Component | Description | |
| |-----------|-------------| |
| | `mid_q` | Last hidden representation from mid layer (16) of a PISCO decoder model with standard prompt, compressed context, and a question.| |
|
|
|
|
| Output: probability that the compressed representation has **overflowed** (i.e., lost critical information). |
|
|
| ## Installation |
|
|
| ```bash |
| pip install torch huggingface_hub |
| ``` |
|
|
| ## Usage |
|
|
| ### 1. Get the class definition |
|
|
| The model requires the `PISCOClassifier` class to load. Grab it from this repo: |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import importlib.util, sys |
| |
| path = hf_hub_download("wexumin/overflow_probe_pisco_squad", "pisco_clf.py") |
| spec = importlib.util.spec_from_file_location("pisco_clf", path) |
| mod = importlib.util.module_from_spec(spec) |
| spec.loader.exec_module(mod) |
| PISCOClassifier = mod.PISCOClassifier |
| ``` |
|
|
| ### 2. Load the model |
|
|
| ```python |
| model = PISCOClassifier.from_pretrained("wexumin/overflow_probe_pisco_squad") |
| ``` |
|
|
| ### 3. Run inference |
|
|
| ```python |
| |
| # postproj: compressed doc embedding (4096-dim) |
| x = mid_q |
| |
| probs = model.predict_proba(x) # (n, ) β is overflow probability |
| preds = model.predict(x) # (n,) β binary 0/1 (one can provide custom threshold parameter) |
| ``` |
|
|
| ## Training Data |
|
|
| - **SQuAD** β extractive QA over Wikipedia paragraphs |
|
|
| Each context in the dataset was reduced to just question-answering sentence and then filled with noise context to be up to 128 tokens (in terms of pisco encoder tokenzier). |
| ## Architecture |
|
|
| ``` |
| β Linear(4096, 512) |
| β LayerNorm |
| β GELU |
| β Dropout(0.3) |
| β Linear(512, 128) |
| β GELU |
| β Dropout(0.2) |
| β Linear(128, 1) |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{belikova-etal-2026-detecting, |
| title = "Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation", |
| author = "Belikova, Julia and Rozhevskii, Danila and Svirin, Dennis and Polev, Konstantin and Panchenko, Alexander", |
| editor = "Baez Santamaria, Selene and Somayajula, Sai Ashish and Yamaguchi, Atsuki", |
| booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 4: Student Research Workshop)", |
| month = mar, |
| year = "2026", |
| address = "Rabat, Morocco", |
| publisher = "Association for Computational Linguistics", |
| url = "https://aclanthology.org/2026.eacl-srw.59/", |
| pages = "797--810", |
| ISBN = "979-8-89176-383-8" |
| } |
| ``` |