tdooms
/

fw-medium

Model card Files Files and versions

fw-medium / README.md

tdooms's picture

Update README.md

fac6bc8 verified about 1 year ago

|

history blame contribute delete

715 Bytes

	---
	library_name: transformers
	tags: []
	---
	# FW Medium

	This is the medium version of the bilinear transformers trained on FineWeb-edu.
	The primary purpose of this model is interpretability, most design choices were made with that in mind.

	The code to run this custom model can be found [here](https://github.com/tdooms/bilinear-decomposition), along with many utility functions for weight-based interpretability.

	## Model Details
	- 335 million parameters
	- 16 layers
	- 16 attention heads
	- model dimension 1024
	- bilinear MLP with expansion factor 4
	- context length of 512
	- trained for 32B tokens
	- rotary positional embedding
	- Mixtral [tokenizer](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)