| license: mit | |
| language: | |
| - en | |
| tags: | |
| - attention-analysis | |
| - long-context | |
| - modernbert | |
| base_model: answerdotai/ModernBERT-base | |
| # Long-Context Attention Regressor (Entropy) | |
| Predicts the **attention entropy** of a text sample - how spread out vs focused the attention patterns are. | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-entropy") | |
| tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-entropy") | |
| text = "Your text here..." | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192) | |
| with torch.no_grad(): | |
| score = model(**inputs).logits.item() | |
| # Higher score = more spread attention (uses more context) | |
| ``` | |
| ## Training | |
| - **Base model**: ModernBERT-base (8K context) | |
| - **Target**: Normalized attention entropy | |
| - **Labels**: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14 | |
| ## Citation | |
| Part of research on attention-based data filtering for long-context pretraining. | |