bailin28
/

gla-1B-100B

Text Generation

Model card Files Files and versions

bailin28 commited on Feb 12, 2024

Commit

85e27ce

·

verified ·

1 Parent(s): af49087

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -7,4 +7,6 @@ language:
 ---
-This checkpoint of the 1.3B GLA model used in the paper [Gated Linear Attention](https://arxiv.org/abs/2312.06635).  See the model and loading script in this [repo](https://github.com/berlino/gated_linear_attention).

 ---
+This checkpoint of the 1.3B GLA model used in the paper [Gated Linear Attention](https://arxiv.org/abs/2312.06635).  The model is trained with 100B tokens from the SlimPajama dataset tokenized with Llama2 tokenizer.
+See the model and loading script in this [repo](https://github.com/berlino/gated_linear_attention).