GSA - a gist-sparse-attention Collection

updated 9 days ago

Models and Datasets of paper: [Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention]

Upvote

gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk16

333k • Updated Apr 6 • 6
gist-sparse-attention/GSA-PT-Llama-3.2-1B-chunk8

1B • Updated Apr 6 • 2
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk8

333k • Updated Apr 6 • 2
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk4-chunk4

333k • Updated Apr 6 • 2
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk32

333k • Updated Apr 6 • 5
gist-sparse-attention/GSA-PT-Llama-3.2-1B-chunk4-chunk4

1B • Updated Apr 6 • 1
gist-sparse-attention/GSA-PT-Llama-3.2-1B-chunk16

1B • Updated Apr 6 • 2
gist-sparse-attention/GSA-FT-Llama-3.2-1B-chunk8

1B • Updated Apr 6 • 2
gist-sparse-attention/GSA-FT-Llama-3.2-1B-chunk4-chunk4

1B • Updated Apr 6 • 2
gist-sparse-attention/GSA-FT-Llama-3.2-1B-chunk16

1B • Updated Apr 6 • 7
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk8-chunk4

333k • Updated Apr 6 • 3
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk8

333k • Updated Apr 6 • 1
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk16

333k • Updated Apr 6 • 3
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk32

333k • Updated Apr 6 • 4
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk4-chunk4

333k • Updated Apr 6 • 543
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk8-chunk4

333k • Updated Apr 6 • 5
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk8-chunk4-data

Viewer • Updated Apr 6 • 88.6k • 272
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk8-data

Preview • Updated Apr 6 • 32
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk16-data

Preview • Updated Apr 6 • 106
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk32-data

Viewer • Updated Apr 6 • 25.9k • 160
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk4-chunk4-data

Preview • Updated Apr 6 • 109
gist-sparse-attention/GSA-FT-Qwen2-7B-Instruct-chunk8-chunk4-data

Preview • Updated Apr 6 • 109
gist-sparse-attention/GSA-FT-Llama-3.2-1B-data

Preview • Updated Apr 6 • 329
gist-sparse-attention/GSA-PT-Llama-3.2-1B-chunk16-data

Updated Apr 6 • 443
gist-sparse-attention/GSA-PT-Llama-3.2-1B-chunk8-data

Updated Apr 6 • 513
gist-sparse-attention/GSA-PT-Llama-3.2-1B-chunk4-chunk4-data

Updated Apr 6 • 514
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk8-data

Viewer • Updated Apr 6 • 89.9k • 205
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk16-data

Viewer • Updated Apr 6 • 88.8k • 261
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk32-data

Preview • Updated Apr 6 • 160
gist-sparse-attention/GSA-PT-Qwen2-7B-Instruct-chunk4-chunk4-data

Viewer • Updated Apr 6 • 88.5k • 272

Upvote