KevinDavidHayes
/

regressor-entropy

+---
+license: mit
+language:
+- en
+tags:
+- attention-analysis
+- long-context
+- modernbert
+base_model: answerdotai/ModernBERT-base
+---
+# Long-Context Attention Regressor (Entropy)
+Predicts the **attention entropy** of a text sample - how spread out vs focused the attention patterns are.
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-entropy")
+tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-entropy")
+text = "Your text here..."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
+with torch.no_grad():
+    score = model(**inputs).logits.item()
+# Higher score = more spread attention (uses more context)
+```
+## Training
+- **Base model**: ModernBERT-base (8K context)
+- **Target**: Normalized attention entropy
+- **Labels**: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14
+## Citation
+Part of research on attention-based data filtering for long-context pretraining.