tdooms
/

fw-tiny-old

Model card Files Files and versions

fw-tiny-old / README.md

tdooms's picture

Update README.md

c4d4555 verified over 1 year ago

|

history blame contribute delete

709 Bytes

	---
	library_name: transformers
	tags: []
	---
	# FW Tiny

	This is the tiny version of the bilinear transformers trained on FineWeb-edu.
	The primary purpose of this model is interpretability, most design choices were made with that in mind.

	The code to run this custom model can be found [here](https://github.com/tdooms/bilinear-decomposition), along with many utility functions for weight-based interpretability.

	## Model Details
	- 125 million parameters
	- 8 layers
	- 12 attention heads
	- model dimension 768
	- bilinear MLP with expansion factor 4
	- context length of 512
	- trained for 16B tokens
	- rotary positional embedding
	- Mixtral [tokenizer](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)