Models and Datasets of paper GSA: Gist Sparse Attention via Learnable Compression and Selective Unfolding