Upload folder using huggingface_hub
Browse files- README.md +67 -0
- blocks.0.hook_resid_post/cfg.json +1 -0
- blocks.0.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.0.hook_resid_post/sparsity.safetensors +3 -0
- blocks.14.hook_resid_post/cfg.json +1 -0
- blocks.14.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.14.hook_resid_post/sparsity.safetensors +3 -0
- blocks.27.hook_resid_post/cfg.json +1 -0
- blocks.27.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.27.hook_resid_post/sparsity.safetensors +3 -0
README.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: sae_lens
|
| 3 |
+
tags:
|
| 4 |
+
- sparse-autoencoder
|
| 5 |
+
- mechanistic-interpretability
|
| 6 |
+
- sae
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct
|
| 10 |
+
|
| 11 |
+
This repository contains 3 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).
|
| 12 |
+
|
| 13 |
+
## Model Details
|
| 14 |
+
|
| 15 |
+
| Property | Value |
|
| 16 |
+
|----------|-------|
|
| 17 |
+
| **Base Model** | `Qwen/Qwen2.5-7B-Instruct` |
|
| 18 |
+
| **Architecture** | `topk` |
|
| 19 |
+
| **Input Dimension** | 3584 |
|
| 20 |
+
| **SAE Dimension** | 16384 |
|
| 21 |
+
| **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized` |
|
| 22 |
+
|
| 23 |
+
## Available Hook Points
|
| 24 |
+
|
| 25 |
+
| Hook Point |
|
| 26 |
+
|------------|
|
| 27 |
+
| `blocks.0.hook_resid_post` |
|
| 28 |
+
| `blocks.14.hook_resid_post` |
|
| 29 |
+
| `blocks.27.hook_resid_post` |
|
| 30 |
+
|
| 31 |
+
## Usage
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
from sae_lens import SAE
|
| 35 |
+
|
| 36 |
+
# Load an SAE for a specific hook point
|
| 37 |
+
sae, cfg_dict, sparsity = SAE.from_pretrained(
|
| 38 |
+
release="rufimelo/secure_code_qwen_coder_topk_16384",
|
| 39 |
+
sae_id="blocks.0.hook_resid_post" # Choose from available hook points above
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
# Use with TransformerLens
|
| 43 |
+
from transformer_lens import HookedTransformer
|
| 44 |
+
|
| 45 |
+
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
|
| 46 |
+
|
| 47 |
+
# Get activations and encode
|
| 48 |
+
_, cache = model.run_with_cache("your text here")
|
| 49 |
+
activations = cache["blocks.0.hook_resid_post"]
|
| 50 |
+
features = sae.encode(activations)
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## Files
|
| 54 |
+
|
| 55 |
+
- `blocks.0.hook_resid_post/cfg.json` - SAE configuration
|
| 56 |
+
- `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 57 |
+
- `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 58 |
+
- `blocks.14.hook_resid_post/cfg.json` - SAE configuration
|
| 59 |
+
- `blocks.14.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 60 |
+
- `blocks.14.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 61 |
+
- `blocks.27.hook_resid_post/cfg.json` - SAE configuration
|
| 62 |
+
- `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 63 |
+
- `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 64 |
+
|
| 65 |
+
## Training
|
| 66 |
+
|
| 67 |
+
These SAEs were trained with SAELens version 6.26.2.
|
blocks.0.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.0.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "k": 64, "reshape_activations": "none", "d_in": 3584, "rescale_acts_by_decoder_norm": false, "device": "cuda", "normalize_activations": "layer_norm", "dtype": "float32", "apply_b_dec_to_input": true, "d_sae": 16384, "architecture": "topk"}
|
blocks.0.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:44a01e667bcff986b9305d9c858c5ef5327b74e017a57c155d0085ba46fd99a8
|
| 3 |
+
size 469842240
|
blocks.0.hook_resid_post/sparsity.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:535146750b34df66c95f283ce0e481f475741f8ddf703a2ac3497814efdb7b24
|
| 3 |
+
size 65616
|
blocks.14.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"device": "cuda", "dtype": "float32", "rescale_acts_by_decoder_norm": false, "reshape_activations": "none", "d_sae": 16384, "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.14.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "normalize_activations": "layer_norm", "k": 64, "apply_b_dec_to_input": true, "d_in": 3584, "architecture": "topk"}
|
blocks.14.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:24871a64b763db1025b155f7c209ce6ca970fb472ed29dac09513b4e7ba2f480
|
| 3 |
+
size 294387712
|
blocks.14.hook_resid_post/sparsity.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d7861a07072eb415915f6de79f90544802378b8ce409e673a803b3aa4e7a7ff3
|
| 3 |
+
size 65616
|
blocks.27.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_sae": 16384, "d_in": 3584, "k": 64, "dtype": "float32", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.27.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "apply_b_dec_to_input": true, "device": "cuda", "rescale_acts_by_decoder_norm": false, "normalize_activations": "layer_norm", "architecture": "topk"}
|
blocks.27.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bba9a159dd85c884bf041bfd8760184e2b368f4523de4c9edf7958a300dac9a1
|
| 3 |
+
size 469842240
|
blocks.27.hook_resid_post/sparsity.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5a21cd2f3025bdf503f3b3c90f1d6278363d42a13b0d4fb7624d7d92cd1f6b7b
|
| 3 |
+
size 65616
|