File size: 722 Bytes
ef394e5
 
 
 
c766ddb
ef394e5
c766ddb
 
ef394e5
c766ddb
ef394e5
 
c766ddb
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
library_name: transformers
tags: []
---
# TS Large

This is the medium version of the bilinear transformers trained on TinyStories.
The primary purpose of this model is interpretability, most design choices were made with that in mind.

The code to run this custom model can be found [here](https://github.com/tdooms/bilinear-decomposition), along with many utility functions for weight-based interpretability.

## Model Details
- 82 million parameters
- 8 layers
- 12 attention heads
- model dimension 768
- bilinear MLP with expansion factor 4
- context length of 256
- trained for 1 epoch (~500M tokens)
- rotary positional embedding
- custom tinystories [tokenizer](https://huggingface.co/tdooms/ts-tokenizer-4096)