File size: 724 Bytes
529cd5d
 
 
 
 
23c11d3
529cd5d
23c11d3
 
529cd5d
23c11d3
529cd5d
 
23c11d3
 
e5e17f9
23c11d3
 
 
e5e17f9
23c11d3
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
library_name: transformers
tags: []
---

# TS Medium

This is the medium version of the bilinear transformers trained on TinyStories.
The primary purpose of this model is interpretability, most design choices were made with that in mind.

The code to run this custom model can be found [here](https://github.com/tdooms/bilinear-decomposition), along with many utility functions for weight-based interpretability.

## Model Details
- 30 million parameters
- 6 layers
- 8 attention heads
- model dimension 512
- bilinear MLP with expansion factor 4
- context length of 256
- trained for 1 epoch (~2.5B tokens)
- rotary positional embedding
- custom tinystories [tokenizer](https://huggingface.co/tdooms/ts-tokenizer-4096)