| | --- |
| | library_name: sae_lens |
| | tags: |
| | - sparse-autoencoder |
| | - mechanistic-interpretability |
| | - sae |
| | --- |
| | |
| | # Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct |
| |
|
| | This repository contains 1 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens). |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | **Base Model** | `Qwen/Qwen2.5-7B-Instruct` | |
| | | **Architecture** | `standard` | |
| | | **Input Dimension** | 3584 | |
| | | **SAE Dimension** | 16384 | |
| | | **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable` | |
| |
|
| | ## Available Hook Points |
| |
|
| | | Hook Point | |
| | |------------| |
| | | `blocks.11.hook_resid_post` | |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from sae_lens import SAE |
| | |
| | # Load an SAE for a specific hook point |
| | sae, cfg_dict, sparsity = SAE.from_pretrained( |
| | release="rufimelo/vulnerable_code_qwen_coder_standard_16384_5M", |
| | sae_id="blocks.11.hook_resid_post" # Choose from available hook points above |
| | ) |
| | |
| | # Use with TransformerLens |
| | from transformer_lens import HookedTransformer |
| | |
| | model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct") |
| | |
| | # Get activations and encode |
| | _, cache = model.run_with_cache("your text here") |
| | activations = cache["blocks.11.hook_resid_post"] |
| | features = sae.encode(activations) |
| | ``` |
| |
|
| | ## Files |
| |
|
| | - `blocks.11.hook_resid_post/cfg.json` - SAE configuration |
| | - `blocks.11.hook_resid_post/sae_weights.safetensors` - Model weights |
| | - `blocks.11.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics |
| |
|
| | ## Training |
| |
|
| | These SAEs were trained with SAELens version 6.26.2. |
| |
|