rufimelo commited on
Commit
0832910
·
verified ·
1 Parent(s): 96cf1b8

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: sae_lens
3
+ tags:
4
+ - sparse-autoencoder
5
+ - mechanistic-interpretability
6
+ - sae
7
+ ---
8
+
9
+ # Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct
10
+
11
+ This repository contains 8 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).
12
+
13
+ ## Model Details
14
+
15
+ | Property | Value |
16
+ |----------|-------|
17
+ | **Base Model** | `Qwen/Qwen2.5-7B-Instruct` |
18
+ | **Architecture** | `standard` |
19
+ | **Input Dimension** | 3584 |
20
+ | **SAE Dimension** | 16384 |
21
+ | **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable` |
22
+
23
+ ## Available Hook Points
24
+
25
+ | Hook Point |
26
+ |------------|
27
+ | `blocks.11.hook_resid_post` |
28
+ | `blocks.0.hook_resid_post` |
29
+ | `blocks.3.hook_resid_post` |
30
+ | `blocks.7.hook_resid_post` |
31
+ | `blocks.15.hook_resid_post` |
32
+ | `blocks.19.hook_resid_post` |
33
+ | `blocks.23.hook_resid_post` |
34
+ | `blocks.27.hook_resid_post` |
35
+
36
+ ## Usage
37
+
38
+ ```python
39
+ from sae_lens import SAE
40
+
41
+ # Load an SAE for a specific hook point
42
+ sae, cfg_dict, sparsity = SAE.from_pretrained(
43
+ release="rufimelo/vulnerable_code_qwen_coder_standard_16384",
44
+ sae_id="blocks.11.hook_resid_post" # Choose from available hook points above
45
+ )
46
+
47
+ # Use with TransformerLens
48
+ from transformer_lens import HookedTransformer
49
+
50
+ model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
51
+
52
+ # Get activations and encode
53
+ _, cache = model.run_with_cache("your text here")
54
+ activations = cache["blocks.11.hook_resid_post"]
55
+ features = sae.encode(activations)
56
+ ```
57
+
58
+ ## Files
59
+
60
+ - `blocks.11.hook_resid_post/cfg.json` - SAE configuration
61
+ - `blocks.11.hook_resid_post/sae_weights.safetensors` - Model weights
62
+ - `blocks.11.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
63
+ - `blocks.0.hook_resid_post/cfg.json` - SAE configuration
64
+ - `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights
65
+ - `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
66
+ - `blocks.3.hook_resid_post/cfg.json` - SAE configuration
67
+ - `blocks.3.hook_resid_post/sae_weights.safetensors` - Model weights
68
+ - `blocks.3.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
69
+ - `blocks.7.hook_resid_post/cfg.json` - SAE configuration
70
+ - `blocks.7.hook_resid_post/sae_weights.safetensors` - Model weights
71
+ - `blocks.7.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
72
+ - `blocks.15.hook_resid_post/cfg.json` - SAE configuration
73
+ - `blocks.15.hook_resid_post/sae_weights.safetensors` - Model weights
74
+ - `blocks.15.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
75
+ - `blocks.19.hook_resid_post/cfg.json` - SAE configuration
76
+ - `blocks.19.hook_resid_post/sae_weights.safetensors` - Model weights
77
+ - `blocks.19.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
78
+ - `blocks.23.hook_resid_post/cfg.json` - SAE configuration
79
+ - `blocks.23.hook_resid_post/sae_weights.safetensors` - Model weights
80
+ - `blocks.23.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
81
+ - `blocks.27.hook_resid_post/cfg.json` - SAE configuration
82
+ - `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights
83
+ - `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
84
+
85
+ ## Training
86
+
87
+ These SAEs were trained with SAELens version 6.26.2.
blocks.0.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.0.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
blocks.0.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98e507565eb86dd6565f681d31b3a12568477365dd10826a5342c570fa932e05
3
+ size 30670848
blocks.11.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.11.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
blocks.11.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16d60fd7ec267f5e1a20ad53cb6483e853e78bb8e7bdc586699f3028f1effb6d
3
+ size 30670848
blocks.15.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.15.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
blocks.15.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8be557740c31aba61e20998066c717dd8a637f17206a02be51bf1b6d848e209
3
+ size 30670848
blocks.19.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.19.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
blocks.19.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98d4b9e3bb446f8e421cebfc44a2ca7ea20d66327f81bed5e9cea6a971b09ea3
3
+ size 30670848
blocks.23.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.23.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
blocks.23.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdcec4b879046520d7621e95ad08e08b73ac458771634fd738164533535a1b60
3
+ size 30670848
blocks.27.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.27.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
blocks.27.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c53cb38ed82fb2411fe81a11928dd47a85fd9c307fad2632ceb6470de7520c0e
3
+ size 30670848
blocks.3.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.3.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
blocks.3.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f0742bd64fdb692e53aff58f8f40c178959fd039f73a8746e24ccc9e511673a
3
+ size 30670848
blocks.7.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.7.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
blocks.7.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25e2272917f2fc0a853fa1f49e125662887e8110219d3865df48ec2a654d32c5
3
+ size 30670848