rufimelo
/

vulnerable_code_qwen_coder_standard_16384_5M

sparse-autoencoder

mechanistic-interpretability

Model card Files Files and versions

vulnerable_code_qwen_coder_standard_16384_5M / README.md

rufimelo's picture

Upload folder using huggingface_hub

bde2110 verified 12 days ago

|

history blame contribute delete

1.55 kB

	---
	library_name: sae_lens
	tags:
	- sparse-autoencoder
	- mechanistic-interpretability
	- sae
	---

	# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct

	This repository contains 1 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| `Qwen/Qwen2.5-7B-Instruct` \|
	\| Architecture \| `standard` \|
	\| Input Dimension \| 3584 \|
	\| SAE Dimension \| 16384 \|
	\| Training Dataset \| `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable` \|

	## Available Hook Points

	\| Hook Point \|
	\|------------\|
	\| `blocks.11.hook_resid_post` \|

	## Usage

	```python
	from sae_lens import SAE

	# Load an SAE for a specific hook point
	sae, cfg_dict, sparsity = SAE.from_pretrained(
	release="rufimelo/vulnerable_code_qwen_coder_standard_16384_5M",
	sae_id="blocks.11.hook_resid_post" # Choose from available hook points above
	)

	# Use with TransformerLens
	from transformer_lens import HookedTransformer

	model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

	# Get activations and encode
	_, cache = model.run_with_cache("your text here")
	activations = cache["blocks.11.hook_resid_post"]
	features = sae.encode(activations)
	```

	## Files

	- `blocks.11.hook_resid_post/cfg.json` - SAE configuration
	- `blocks.11.hook_resid_post/sae_weights.safetensors` - Model weights
	- `blocks.11.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics

	## Training

	These SAEs were trained with SAELens version 6.26.2.