KevinDavidHayes
/

regressor-composite

attention-analysis

Model card Files Files and versions

regressor-composite / README.md

KevinDavidHayes's picture

KevinDavidHayes

Upload README.md with huggingface_hub

ba5ff67 verified about 1 month ago

|

history blame contribute delete

1.36 kB

	---
	license: mit
	language:
	- en
	tags:
	- attention-analysis
	- long-context
	- modernbert
	base_model: answerdotai/ModernBERT-base
	---

	# Long-Context Attention Regressor (Composite)

	Predicts a composite score combining multiple attention metrics to identify text that benefits from long context.

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-composite")
	tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-composite")

	text = "Your text here..."
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)

	with torch.no_grad():
	score = model(**inputs).logits.item()

	# Higher score = text benefits more from long-range attention
	```

	## Training

	- Base model: ModernBERT-base (8K context)
	- Target: Weighted combination: 0.2 * mean_distance + 0.4 * inv_local_ratio + 0.4 * entropy
	- Labels: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14

	## Why Composite?

	Cross-context correlation analysis showed:
	- mean_distance: r=0.71 (4K→32K)
	- local_ratio: r=0.92
	- entropy: r=0.92

	The composite weights metrics by their cross-context stability.

	## Citation

	Part of research on attention-based data filtering for long-context pretraining.