Papers
arxiv:2512.07782

GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory

Published on Dec 8, 2025
Authors:
,
,

Abstract

GatedFWA addresses the instability issues of sliding window attention by introducing learnable gates that stabilize memory updates and improve gradient flow in autoregressive models.

AI-generated summary

Modern autoregressive models rely on attention, yet the Softmax full attention in Transformers scales quadratically with sequence length. Sliding Window Attention (SWA) achieves linear-time encoding/decoding by constraining the attention pattern, but under an Associative Memory interpretation, its difference-style update renders the training objective effectively unbounded. In contrast, Softmax attention normalizes updates, leading to memory shrinkage and gradient vanishing. We propose GatedFWA: a Memory-Gated (Flash) Windowed Attention mechanism that preserves SWAs efficiency while stabilizing memory updates and making gradient flow controllable. In essence, GatedFWA accumulate a per-token/head gate into a decay bias added to the attention logits, acting as a learnable contraction in the memory recurrence. We implement a fused one-pass gate preprocessing and a FlashAttention-compatible kernel that injects the gate under a sliding mask, ensuring I/O efficiency and numerical stability. On language modelling benchmarks, GatedFWA delivers competitive throughput with negligible overhead and better use of global context, and it integrates cleanly with token compression/selection methods such as NSA and generalizes to various autoregressive domains.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.07782 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.07782 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.07782 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.