Native CUDA sequential state-scan kernel for Gated DeltaNet style linear attention prefill.
Available function:
gated_delta_recurrent_seq_bf16