rufimelo commited on
Commit
8ae0854
·
verified ·
1 Parent(s): 0d4747d

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: sae_lens
3
+ tags:
4
+ - sparse-autoencoder
5
+ - mechanistic-interpretability
6
+ - sae
7
+ ---
8
+
9
+ # Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct
10
+
11
+ This repository contains 3 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).
12
+
13
+ ## Model Details
14
+
15
+ | Property | Value |
16
+ |----------|-------|
17
+ | **Base Model** | `Qwen/Qwen2.5-7B-Instruct` |
18
+ | **Architecture** | `topk` |
19
+ | **Input Dimension** | 3584 |
20
+ | **SAE Dimension** | 16384 |
21
+ | **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized` |
22
+
23
+ ## Available Hook Points
24
+
25
+ | Hook Point |
26
+ |------------|
27
+ | `blocks.0.hook_resid_post` |
28
+ | `blocks.14.hook_resid_post` |
29
+ | `blocks.27.hook_resid_post` |
30
+
31
+ ## Usage
32
+
33
+ ```python
34
+ from sae_lens import SAE
35
+
36
+ # Load an SAE for a specific hook point
37
+ sae, cfg_dict, sparsity = SAE.from_pretrained(
38
+ release="rufimelo/secure_code_qwen_coder_topk_16384",
39
+ sae_id="blocks.0.hook_resid_post" # Choose from available hook points above
40
+ )
41
+
42
+ # Use with TransformerLens
43
+ from transformer_lens import HookedTransformer
44
+
45
+ model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
46
+
47
+ # Get activations and encode
48
+ _, cache = model.run_with_cache("your text here")
49
+ activations = cache["blocks.0.hook_resid_post"]
50
+ features = sae.encode(activations)
51
+ ```
52
+
53
+ ## Files
54
+
55
+ - `blocks.0.hook_resid_post/cfg.json` - SAE configuration
56
+ - `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights
57
+ - `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
58
+ - `blocks.14.hook_resid_post/cfg.json` - SAE configuration
59
+ - `blocks.14.hook_resid_post/sae_weights.safetensors` - Model weights
60
+ - `blocks.14.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
61
+ - `blocks.27.hook_resid_post/cfg.json` - SAE configuration
62
+ - `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights
63
+ - `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
64
+
65
+ ## Training
66
+
67
+ These SAEs were trained with SAELens version 6.26.2.
blocks.0.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.0.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "k": 64, "reshape_activations": "none", "d_in": 3584, "rescale_acts_by_decoder_norm": false, "device": "cuda", "normalize_activations": "layer_norm", "dtype": "float32", "apply_b_dec_to_input": true, "d_sae": 16384, "architecture": "topk"}
blocks.0.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44a01e667bcff986b9305d9c858c5ef5327b74e017a57c155d0085ba46fd99a8
3
+ size 469842240
blocks.0.hook_resid_post/sparsity.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:535146750b34df66c95f283ce0e481f475741f8ddf703a2ac3497814efdb7b24
3
+ size 65616
blocks.14.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"device": "cuda", "dtype": "float32", "rescale_acts_by_decoder_norm": false, "reshape_activations": "none", "d_sae": 16384, "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.14.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "normalize_activations": "layer_norm", "k": 64, "apply_b_dec_to_input": true, "d_in": 3584, "architecture": "topk"}
blocks.14.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24871a64b763db1025b155f7c209ce6ca970fb472ed29dac09513b4e7ba2f480
3
+ size 294387712
blocks.14.hook_resid_post/sparsity.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7861a07072eb415915f6de79f90544802378b8ce409e673a803b3aa4e7a7ff3
3
+ size 65616
blocks.27.hook_resid_post/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_sae": 16384, "d_in": 3584, "k": 64, "dtype": "float32", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.27.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "apply_b_dec_to_input": true, "device": "cuda", "rescale_acts_by_decoder_norm": false, "normalize_activations": "layer_norm", "architecture": "topk"}
blocks.27.hook_resid_post/sae_weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bba9a159dd85c884bf041bfd8760184e2b368f4523de4c9edf7958a300dac9a1
3
+ size 469842240
blocks.27.hook_resid_post/sparsity.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a21cd2f3025bdf503f3b3c90f1d6278363d42a13b0d4fb7624d7d92cd1f6b7b
3
+ size 65616