tdooms commited on
Commit
5b09164
·
verified ·
1 Parent(s): fee82a9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+ # FW Medium
6
+
7
+ This is the tiny version of the bilinear transformers trained on FineWeb-edu.
8
+ The primary purpose of this model is interpretability, most design choices were made with that in mind.
9
+
10
+ The code to run this custom model can be found [here](https://github.com/tdooms/bilinear-decomposition), along with many utility functions for weight-based interpretability.
11
+
12
+ ## Model Details
13
+ - 335 million parameters
14
+ - 16 layers
15
+ - 16 attention heads
16
+ - model dimension 1024
17
+ - bilinear MLP with expansion factor 4
18
+ - context length of 512
19
+ - trained for 132B tokens
20
+ - rotary positional embedding
21
+ - Mixtral [tokenizer](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)