|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
|
|
|
# TS Medium |
|
|
|
|
|
This is the medium version of the bilinear transformers trained on TinyStories. |
|
|
The primary purpose of this model is interpretability, most design choices were made with that in mind. |
|
|
|
|
|
The code to run this custom model can be found [here](https://github.com/tdooms/bilinear-decomposition), along with many utility functions for weight-based interpretability. |
|
|
|
|
|
## Model Details |
|
|
- 30 million parameters |
|
|
- 6 layers |
|
|
- 8 attention heads |
|
|
- model dimension 512 |
|
|
- bilinear MLP with expansion factor 4 |
|
|
- context length of 256 |
|
|
- trained for 1 epoch (~2.5B tokens) |
|
|
- rotary positional embedding |
|
|
- custom tinystories [tokenizer](https://huggingface.co/tdooms/ts-tokenizer-4096) |
|
|
|