File size: 715 Bytes
5b09164 fac6bc8 5b09164 73fae70 5b09164 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
---
library_name: transformers
tags: []
---
# FW Medium
This is the medium version of the bilinear transformers trained on FineWeb-edu.
The primary purpose of this model is interpretability, most design choices were made with that in mind.
The code to run this custom model can be found [here](https://github.com/tdooms/bilinear-decomposition), along with many utility functions for weight-based interpretability.
## Model Details
- 335 million parameters
- 16 layers
- 16 attention heads
- model dimension 1024
- bilinear MLP with expansion factor 4
- context length of 512
- trained for 32B tokens
- rotary positional embedding
- Mixtral [tokenizer](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) |