|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- attention-analysis |
|
|
- long-context |
|
|
- modernbert |
|
|
base_model: answerdotai/ModernBERT-base |
|
|
--- |
|
|
|
|
|
# Long-Context Attention Regressor (Composite) |
|
|
|
|
|
Predicts a **composite score** combining multiple attention metrics to identify text that benefits from long context. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-composite") |
|
|
tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-composite") |
|
|
|
|
|
text = "Your text here..." |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192) |
|
|
|
|
|
with torch.no_grad(): |
|
|
score = model(**inputs).logits.item() |
|
|
|
|
|
# Higher score = text benefits more from long-range attention |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Base model**: ModernBERT-base (8K context) |
|
|
- **Target**: Weighted combination: 0.2 * mean_distance + 0.4 * inv_local_ratio + 0.4 * entropy |
|
|
- **Labels**: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14 |
|
|
|
|
|
## Why Composite? |
|
|
|
|
|
Cross-context correlation analysis showed: |
|
|
- mean_distance: r=0.71 (4K→32K) |
|
|
- local_ratio: r=0.92 |
|
|
- entropy: r=0.92 |
|
|
|
|
|
The composite weights metrics by their cross-context stability. |
|
|
|
|
|
## Citation |
|
|
|
|
|
Part of research on attention-based data filtering for long-context pretraining. |
|
|
|