sxiong commited on
Commit
cfa2337
·
verified ·
1 Parent(s): 490a41a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # Dynamic Hierarchical Sparse Attention (DHSA)
8
+
9
+ This repository hosts the boundary predictor weights used in *[Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs](https://arxiv.org/pdf/2510.24606)* (NeurIPS 2025 Workshop on Efficient Reasoning).
10
+
11
+ ## Overview
12
+
13
+ The boundary predictor is a lightweight transformer module trained to dynamically segment long text sequences into variable-length chunks based on semantic boundaries.
14
+ It forms the first stage of Dynamic Hierarchical Sparse Attention (DHSA), enabling large language models to efficiently process long contexts by predicting where to attend sparsely.
15
+
16
+ ## Model Architecture
17
+ The boundary predictor consists of three main parts:
18
+ 1. Shared Encoder – Uses attention and pooling layers to capture the left and right context around each token.
19
+ 2. Feature Fusion – Combines the two contextual features along with their difference, product, and similarity to represent local semantic changes.
20
+ 3. MLP Classifier – Takes the fused features and predicts whether a given position marks a boundary.
21
+
22
+ This lightweight design efficiently identifies semantic shifts in long text sequences.
23
+
24
+ ## Training Data
25
+
26
+ The predictor was trained on diverse long-context datasets combining multiple reasoning and QA sources:
27
+
28
+ * [Long Data Collections](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections)
29
+ * [Trivia QA](https://huggingface.co/datasets/mandarjoshi/trivia_qa)
30
+ * [ChatQA2-Long-SFT](https://huggingface.co/datasets/nvidia/ChatQA2-Long-SFT-data)
31
+
32
+ Data were automatically annotated using an internal semantic-boundary labeling process based on similarity shifts between consecutive key embeddings.
33
+
34
+ ## Citation
35
+
36
+ If you use this model, please cite:
37
+
38
+ ```
39
+ @misc{xiong2025longcontextmodelingdynamichierarchical,
40
+ title={Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs},
41
+ author={Siheng Xiong and Joe Zou and Faramarz Fekri and Yae Jee Cho},
42
+ year={2025},
43
+ eprint={2510.24606},
44
+ archivePrefix={arXiv},
45
+ primaryClass={cs.CL}
46
+ }
47
+ ```
48
+