add model weights for 3 implementations of einygpt
Browse files- README.md +10 -0
- model_weights_gqa_tt.pth +3 -0
- model_weights_mha.pth +3 -0
- model_weights_mqa.pth +3 -0
README.md
CHANGED
|
@@ -1,3 +1,13 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
# einygpt
|
| 6 |
+
|
| 7 |
+
Here's the models I've trained with the model in [einygpt](https://github.com/clankur/einygpt). For reference they are:
|
| 8 |
+
|
| 9 |
+
- [a multihead attention model](./model_weights_mha.pth) replicating the model discussed in the [TinyStories paper](https://arxiv.org/abs/2305.07759) using the GPT2Tokenizer
|
| 10 |
+
- [a multiquery attention model](model_weights_mqa.pth) using the GPT2Tokenizer
|
| 11 |
+
- [a grouped query attention model with the number of groups = 4](model_weights_gqa_tt.pth) and using its own [tokenizer](https://github.com/clankur/einygpt/blob/main/tiny_tokenizer.py)
|
| 12 |
+
|
| 13 |
+
For playing with these model, you can view how they are used [here](https://github.com/clankur/einygpt/blob/main/perplexity.ipynb)
|
model_weights_gqa_tt.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3abfbdd339e49a369a1c7a0176a754c281d87ca46d19f8249c6116a3b31e3312
|
| 3 |
+
size 17763087
|
model_weights_mha.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:adc57bb222d0af37f2fe187c0ef16c64de8f83383fe70e62a9269491745c9cfe
|
| 3 |
+
size 28085519
|
model_weights_mqa.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:141c0e3705e6ad5c15131acde6965ecedf50ef64ff2881efeaee88be43653fa5
|
| 3 |
+
size 28429583
|