tdooms
/

fw-medium

tdooms commited on Oct 15, 2024

Commit

5b09164

verified ·

1 Parent(s): fee82a9

Create README.md

Files changed (1) hide show

README.md ADDED Viewed

+---
+library_name: transformers
+tags: []
+---
+# FW Medium
+This is the tiny version of the bilinear transformers trained on FineWeb-edu.
+The primary purpose of this model is interpretability, most design choices were made with that in mind.
+The code to run this custom model can be found [here](https://github.com/tdooms/bilinear-decomposition), along with many utility functions for weight-based interpretability.
+## Model Details
+- 335 million parameters
+- 16 layers
+- 16 attention heads
+- model dimension 1024
+- bilinear MLP with expansion factor 4
+- context length of 512
+- trained for 132B tokens
+- rotary positional embedding
+- Mixtral [tokenizer](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)