File size: 2,826 Bytes
ec4a233
01d02e1
 
 
 
 
 
 
 
 
ec4a233
01d02e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
706ac74
 
01d02e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# Overflow Probe on PISCO representations

A binary MLP probe that detects **token overflow** in soft-compressed document representations [PISCO](https://arxiv.org/abs/2501.16075). Token overflow occurs when a document's information content exceeds the capacity of the compressed token budget, leading to degraded downstream QA performance.

## How It Works

The probe takes a 4096-dim vector:

| Component | Description |
|-----------|-------------|
| `mid_q` | Last hidden representation from mid layer (16) of a PISCO decoder model with standard prompt, compressed context, and a question.|


Output: probability that the compressed representation has **overflowed** (i.e., lost critical information).

## Installation

```bash
pip install torch huggingface_hub
```

## Usage

### 1. Get the class definition

The model requires the `PISCOClassifier` class to load. Grab it from this repo:

```python
from huggingface_hub import hf_hub_download
import importlib.util, sys

path = hf_hub_download("wexumin/overflow_probe_pisco_squad", "pisco_clf.py")
spec = importlib.util.spec_from_file_location("pisco_clf", path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
PISCOClassifier = mod.PISCOClassifier
```

### 2. Load the model

```python
model = PISCOClassifier.from_pretrained("wexumin/overflow_probe_pisco_squad")
```

### 3. Run inference

```python

# postproj: compressed doc embedding (4096-dim)
x = mid_q

probs = model.predict_proba(x)  # (n, ) β€” is overflow probability
preds = model.predict(x)        # (n,)   β€” binary 0/1 (one can provide custom threshold parameter)
```

## Training Data

- **SQuAD** β€” extractive QA over Wikipedia paragraphs

 Each context in the dataset was reduced to just question-answering sentence and then filled with noise context to be up to 128 tokens (in terms of pisco encoder tokenzier).
## Architecture

```
β†’ Linear(4096, 512)
β†’ LayerNorm
β†’ GELU
β†’ Dropout(0.3)
β†’ Linear(512, 128)
β†’ GELU
β†’ Dropout(0.2)
β†’ Linear(128, 1)
```

## Citation

```bibtex
@inproceedings{belikova-etal-2026-detecting,
    title = "Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation",
    author = "Belikova, Julia  and Rozhevskii, Danila  and Svirin, Dennis  and Polev, Konstantin  and Panchenko, Alexander",
    editor = "Baez Santamaria, Selene  and Somayajula, Sai Ashish  and Yamaguchi, Atsuki",
    booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 4: Student Research Workshop)",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.eacl-srw.59/",
    pages = "797--810",
    ISBN = "979-8-89176-383-8"
}
```