jacobcd52
/

ss_d256_f1

Model card Files Files and versions

ss_d256_f1 / README.md

jacobcd52's picture

Upload README.md with huggingface_hub

c21c95e verified 4 months ago

|

history blame contribute delete

1 kB

	# ss_d256_f1

	Weight-sparse transformer trained with the procedure from Gao et al. (2025).

	## Model Details

	- Layers: 4
	- Model Dimension: 256
	- Context Length: 512
	- Head Dimension: 16
	- Vocabulary Size: 4096

	## Sparsity

	- Weight Sparsity: False
	- Target L0 Fraction: 1
	- Activation Sparsity: False

	## Training

	- Dataset: SimpleStories/SimpleStories
	- Tokenizer: SimpleStories/SimpleStories-1.25M
	- Total Tokens: 2,000,000,000

	## Training Run

	- W&B Run: [https://wandb.ai/training-saes/my_sparsity/runs/l0bvjtlw](https://wandb.ai/training-saes/my_sparsity/runs/l0bvjtlw)

	## Usage

	```python
	import torch
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download(repo_id="jacobcd52/ss_d256_f1", filename="pytorch_model.bin")
	config_path = hf_hub_download(repo_id="jacobcd52/ss_d256_f1", filename="config.json")

	# Load (requires the SparseGPT model class from this repo)
	state_dict = torch.load(model_path)
	```