This checkpoint of the 1.3B GLA model used in the paper Gated Linear Attention. The model is trained with 100B tokens from the SlimPajama dataset tokenized with Llama2 tokenizer.

See the model and loading script in this repo.

Downloads last month: 3

Safetensors

Model size

1B params

Tensor type

F32

Paper for bailin28/gla-1B-100B

Gated Linear Attention Transformers with Hardware-Efficient Training

Paper • 2312.06635 • Published Dec 11, 2023 • 9