KevinDavidHayes
/

regressor-composite

+---
+license: mit
+language:
+- en
+tags:
+- attention-analysis
+- long-context
+- modernbert
+base_model: answerdotai/ModernBERT-base
+---
+# Long-Context Attention Regressor (Composite)
+Predicts a **composite score** combining multiple attention metrics to identify text that benefits from long context.
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-composite")
+tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-composite")
+text = "Your text here..."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
+with torch.no_grad():
+    score = model(**inputs).logits.item()
+# Higher score = text benefits more from long-range attention
+```
+## Training
+- **Base model**: ModernBERT-base (8K context)
+- **Target**: Weighted combination: 0.2 * mean_distance + 0.4 * inv_local_ratio + 0.4 * entropy
+- **Labels**: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14
+## Why Composite?
+Cross-context correlation analysis showed:
+- mean_distance: r=0.71 (4K→32K)
+- local_ratio: r=0.92
+- entropy: r=0.92
+The composite weights metrics by their cross-context stability.
+## Citation
+Part of research on attention-based data filtering for long-context pretraining.