Upload folder using huggingface_hub
Browse files- README.md +87 -0
- blocks.0.hook_resid_post/cfg.json +1 -0
- blocks.0.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.11.hook_resid_post/cfg.json +1 -0
- blocks.11.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.15.hook_resid_post/cfg.json +1 -0
- blocks.15.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.19.hook_resid_post/cfg.json +1 -0
- blocks.19.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.23.hook_resid_post/cfg.json +1 -0
- blocks.23.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.27.hook_resid_post/cfg.json +1 -0
- blocks.27.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.3.hook_resid_post/cfg.json +1 -0
- blocks.3.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.7.hook_resid_post/cfg.json +1 -0
- blocks.7.hook_resid_post/sae_weights.safetensors +3 -0
README.md
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: sae_lens
|
| 3 |
+
tags:
|
| 4 |
+
- sparse-autoencoder
|
| 5 |
+
- mechanistic-interpretability
|
| 6 |
+
- sae
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct
|
| 10 |
+
|
| 11 |
+
This repository contains 8 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).
|
| 12 |
+
|
| 13 |
+
## Model Details
|
| 14 |
+
|
| 15 |
+
| Property | Value |
|
| 16 |
+
|----------|-------|
|
| 17 |
+
| **Base Model** | `Qwen/Qwen2.5-7B-Instruct` |
|
| 18 |
+
| **Architecture** | `standard` |
|
| 19 |
+
| **Input Dimension** | 3584 |
|
| 20 |
+
| **SAE Dimension** | 16384 |
|
| 21 |
+
| **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable` |
|
| 22 |
+
|
| 23 |
+
## Available Hook Points
|
| 24 |
+
|
| 25 |
+
| Hook Point |
|
| 26 |
+
|------------|
|
| 27 |
+
| `blocks.11.hook_resid_post` |
|
| 28 |
+
| `blocks.0.hook_resid_post` |
|
| 29 |
+
| `blocks.3.hook_resid_post` |
|
| 30 |
+
| `blocks.7.hook_resid_post` |
|
| 31 |
+
| `blocks.15.hook_resid_post` |
|
| 32 |
+
| `blocks.19.hook_resid_post` |
|
| 33 |
+
| `blocks.23.hook_resid_post` |
|
| 34 |
+
| `blocks.27.hook_resid_post` |
|
| 35 |
+
|
| 36 |
+
## Usage
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
from sae_lens import SAE
|
| 40 |
+
|
| 41 |
+
# Load an SAE for a specific hook point
|
| 42 |
+
sae, cfg_dict, sparsity = SAE.from_pretrained(
|
| 43 |
+
release="rufimelo/vulnerable_code_qwen_coder_standard_16384",
|
| 44 |
+
sae_id="blocks.11.hook_resid_post" # Choose from available hook points above
|
| 45 |
+
)
|
| 46 |
+
|
| 47 |
+
# Use with TransformerLens
|
| 48 |
+
from transformer_lens import HookedTransformer
|
| 49 |
+
|
| 50 |
+
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
|
| 51 |
+
|
| 52 |
+
# Get activations and encode
|
| 53 |
+
_, cache = model.run_with_cache("your text here")
|
| 54 |
+
activations = cache["blocks.11.hook_resid_post"]
|
| 55 |
+
features = sae.encode(activations)
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
## Files
|
| 59 |
+
|
| 60 |
+
- `blocks.11.hook_resid_post/cfg.json` - SAE configuration
|
| 61 |
+
- `blocks.11.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 62 |
+
- `blocks.11.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 63 |
+
- `blocks.0.hook_resid_post/cfg.json` - SAE configuration
|
| 64 |
+
- `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 65 |
+
- `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 66 |
+
- `blocks.3.hook_resid_post/cfg.json` - SAE configuration
|
| 67 |
+
- `blocks.3.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 68 |
+
- `blocks.3.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 69 |
+
- `blocks.7.hook_resid_post/cfg.json` - SAE configuration
|
| 70 |
+
- `blocks.7.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 71 |
+
- `blocks.7.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 72 |
+
- `blocks.15.hook_resid_post/cfg.json` - SAE configuration
|
| 73 |
+
- `blocks.15.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 74 |
+
- `blocks.15.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 75 |
+
- `blocks.19.hook_resid_post/cfg.json` - SAE configuration
|
| 76 |
+
- `blocks.19.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 77 |
+
- `blocks.19.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 78 |
+
- `blocks.23.hook_resid_post/cfg.json` - SAE configuration
|
| 79 |
+
- `blocks.23.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 80 |
+
- `blocks.23.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 81 |
+
- `blocks.27.hook_resid_post/cfg.json` - SAE configuration
|
| 82 |
+
- `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 83 |
+
- `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 84 |
+
|
| 85 |
+
## Training
|
| 86 |
+
|
| 87 |
+
These SAEs were trained with SAELens version 6.26.2.
|
blocks.0.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.0.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.0.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:98e507565eb86dd6565f681d31b3a12568477365dd10826a5342c570fa932e05
|
| 3 |
+
size 30670848
|
blocks.11.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.11.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.11.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:16d60fd7ec267f5e1a20ad53cb6483e853e78bb8e7bdc586699f3028f1effb6d
|
| 3 |
+
size 30670848
|
blocks.15.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.15.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.15.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f8be557740c31aba61e20998066c717dd8a637f17206a02be51bf1b6d848e209
|
| 3 |
+
size 30670848
|
blocks.19.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.19.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.19.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:98d4b9e3bb446f8e421cebfc44a2ca7ea20d66327f81bed5e9cea6a971b09ea3
|
| 3 |
+
size 30670848
|
blocks.23.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.23.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.23.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cdcec4b879046520d7621e95ad08e08b73ac458771634fd738164533535a1b60
|
| 3 |
+
size 30670848
|
blocks.27.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.27.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.27.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c53cb38ed82fb2411fe81a11928dd47a85fd9c307fad2632ceb6470de7520c0e
|
| 3 |
+
size 30670848
|
blocks.3.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.3.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.3.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4f0742bd64fdb692e53aff58f8f40c178959fd039f73a8746e24ccc9e511673a
|
| 3 |
+
size 30670848
|
blocks.7.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.7.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.7.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:25e2272917f2fc0a853fa1f49e125662887e8110219d3865df48ec2a654d32c5
|
| 3 |
+
size 30670848
|