KevinDavidHayes commited on
Commit
ba5ff67
·
verified ·
1 Parent(s): fab9be9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - attention-analysis
7
+ - long-context
8
+ - modernbert
9
+ base_model: answerdotai/ModernBERT-base
10
+ ---
11
+
12
+ # Long-Context Attention Regressor (Composite)
13
+
14
+ Predicts a **composite score** combining multiple attention metrics to identify text that benefits from long context.
15
+
16
+ ## Usage
17
+
18
+ ```python
19
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
20
+ import torch
21
+
22
+ model = AutoModelForSequenceClassification.from_pretrained("KevinDavidHayes/regressor-composite")
23
+ tokenizer = AutoTokenizer.from_pretrained("KevinDavidHayes/regressor-composite")
24
+
25
+ text = "Your text here..."
26
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
27
+
28
+ with torch.no_grad():
29
+ score = model(**inputs).logits.item()
30
+
31
+ # Higher score = text benefits more from long-range attention
32
+ ```
33
+
34
+ ## Training
35
+
36
+ - **Base model**: ModernBERT-base (8K context)
37
+ - **Target**: Weighted combination: 0.2 * mean_distance + 0.4 * inv_local_ratio + 0.4 * entropy
38
+ - **Labels**: Generated using Qwen2.5-7B-Instruct attention analysis at layer 14
39
+
40
+ ## Why Composite?
41
+
42
+ Cross-context correlation analysis showed:
43
+ - mean_distance: r=0.71 (4K→32K)
44
+ - local_ratio: r=0.92
45
+ - entropy: r=0.92
46
+
47
+ The composite weights metrics by their cross-context stability.
48
+
49
+ ## Citation
50
+
51
+ Part of research on attention-based data filtering for long-context pretraining.