--- license: mit language: - en tags: - attention-analysis - long-context - modernbert base_model: answerdotai/ModernBERT-base --- # Long-Context Attention Regressor (Entropy) Predicts the **attention entropy** of a text sample - how spread out vs focused the attention patterns are. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-entropy") tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-entropy") text = "Your text here..." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192) with torch.no_grad(): score = model(**inputs).logits.item() # Higher score = more spread attention (uses more context) ``` ## Training - **Base model**: ModernBERT-base (8K context) - **Target**: Normalized attention entropy - **Labels**: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14 ## Citation Part of research on attention-based data filtering for long-context pretraining.