--- license: mit language: - en tags: - attention-analysis - long-context - modernbert base_model: answerdotai/ModernBERT-base --- # Long-Context Attention Regressor (Composite) Predicts a **composite score** combining multiple attention metrics to identify text that benefits from long context. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-composite") tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-composite") text = "Your text here..." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192) with torch.no_grad(): score = model(**inputs).logits.item() # Higher score = text benefits more from long-range attention ``` ## Training - **Base model**: ModernBERT-base (8K context) - **Target**: Weighted combination: 0.2 * mean_distance + 0.4 * inv_local_ratio + 0.4 * entropy - **Labels**: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14 ## Why Composite? Cross-context correlation analysis showed: - mean_distance: r=0.71 (4K→32K) - local_ratio: r=0.92 - entropy: r=0.92 The composite weights metrics by their cross-context stability. ## Citation Part of research on attention-based data filtering for long-context pretraining.