KevinDavidHayes
/

regressor-local-ratio

+---
+license: mit
+language:
+- en
+tags:
+- attention-analysis
+- long-context
+- modernbert
+base_model: answerdotai/ModernBERT-base
+---
+# Long-Context Attention Regressor (Local Ratio)
+Predicts how much a text sample uses **long-range attention** vs local attention when processed by a transformer LLM.
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-local-ratio")
+tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-local-ratio")
+text = "Your text here..."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
+with torch.no_grad():
+    score = model(**inputs).logits.item()
+# Higher score = more long-range attention patterns
+```
+## Training
+- **Base model**: ModernBERT-base (8K context)
+- **Target**: Inverted local_ratio (fraction of attention outside 128-token window)
+- **Labels**: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14
+## Metrics
+| Metric | Value |
+|--------|-------|
+| Test R² | See regressor_metadata.json |
+| Test MAE | See regressor_metadata.json |
+## Citation
+Part of research on attention-based data filtering for long-context pretraining.