Upload folder using huggingface_hub
Browse files- README.md +67 -0
- blocks.0.hook_resid_post/cfg.json +1 -0
- blocks.0.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.0.hook_resid_post/sparsity.safetensors +3 -0
- blocks.14.hook_resid_post/cfg.json +1 -0
- blocks.14.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.14.hook_resid_post/sparsity.safetensors +3 -0
- blocks.27.hook_resid_post/cfg.json +1 -0
- blocks.27.hook_resid_post/sae_weights.safetensors +3 -0
- blocks.27.hook_resid_post/sparsity.safetensors +3 -0
README.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: sae_lens
|
| 3 |
+
tags:
|
| 4 |
+
- sparse-autoencoder
|
| 5 |
+
- mechanistic-interpretability
|
| 6 |
+
- sae
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct
|
| 10 |
+
|
| 11 |
+
This repository contains 3 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).
|
| 12 |
+
|
| 13 |
+
## Model Details
|
| 14 |
+
|
| 15 |
+
| Property | Value |
|
| 16 |
+
|----------|-------|
|
| 17 |
+
| **Base Model** | `Qwen/Qwen2.5-7B-Instruct` |
|
| 18 |
+
| **Architecture** | `gated` |
|
| 19 |
+
| **Input Dimension** | 3584 |
|
| 20 |
+
| **SAE Dimension** | 16384 |
|
| 21 |
+
| **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized` |
|
| 22 |
+
|
| 23 |
+
## Available Hook Points
|
| 24 |
+
|
| 25 |
+
| Hook Point |
|
| 26 |
+
|------------|
|
| 27 |
+
| `blocks.0.hook_resid_post` |
|
| 28 |
+
| `blocks.14.hook_resid_post` |
|
| 29 |
+
| `blocks.27.hook_resid_post` |
|
| 30 |
+
|
| 31 |
+
## Usage
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
from sae_lens import SAE
|
| 35 |
+
|
| 36 |
+
# Load an SAE for a specific hook point
|
| 37 |
+
sae, cfg_dict, sparsity = SAE.from_pretrained(
|
| 38 |
+
release="rufimelo/secure_code_qwen_coder_gated_16384",
|
| 39 |
+
sae_id="blocks.0.hook_resid_post" # Choose from available hook points above
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
# Use with TransformerLens
|
| 43 |
+
from transformer_lens import HookedTransformer
|
| 44 |
+
|
| 45 |
+
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
|
| 46 |
+
|
| 47 |
+
# Get activations and encode
|
| 48 |
+
_, cache = model.run_with_cache("your text here")
|
| 49 |
+
activations = cache["blocks.0.hook_resid_post"]
|
| 50 |
+
features = sae.encode(activations)
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## Files
|
| 54 |
+
|
| 55 |
+
- `blocks.0.hook_resid_post/cfg.json` - SAE configuration
|
| 56 |
+
- `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 57 |
+
- `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 58 |
+
- `blocks.14.hook_resid_post/cfg.json` - SAE configuration
|
| 59 |
+
- `blocks.14.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 60 |
+
- `blocks.14.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 61 |
+
- `blocks.27.hook_resid_post/cfg.json` - SAE configuration
|
| 62 |
+
- `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 63 |
+
- `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 64 |
+
|
| 65 |
+
## Training
|
| 66 |
+
|
| 67 |
+
These SAEs were trained with SAELens version 6.26.2.
|
blocks.0.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.0.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "apply_b_dec_to_input": true, "d_sae": 16384, "normalize_activations": "layer_norm", "d_in": 3584, "reshape_activations": "none", "dtype": "float32", "device": "cuda", "architecture": "gated"}
|
blocks.0.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2f6c80a1e739e28e4557214f520a22c0c5714145d22050f75f35de66283a2bb1
|
| 3 |
+
size 469973472
|
blocks.0.hook_resid_post/sparsity.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dc69d6855f9478251df097b610c5972ac4dedaa897eadc549fc8579e6a02d4e9
|
| 3 |
+
size 65616
|
blocks.14.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"device": "cuda", "dtype": "float32", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.14.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "d_sae": 16384, "d_in": 3584, "apply_b_dec_to_input": true, "reshape_activations": "none", "normalize_activations": "layer_norm", "architecture": "gated"}
|
blocks.14.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f7c303fab8a318e69a408cdeb5268a9b0d7d5407e7fb1940cbef810b82ddc732
|
| 3 |
+
size 469973472
|
blocks.14.hook_resid_post/sparsity.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:41cbdd824edbc837016cd6a0a9b88d2f1500543750dc6814d778f358fdeed1a1
|
| 3 |
+
size 65616
|
blocks.27.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_sae": 16384, "device": "cuda", "normalize_activations": "layer_norm", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized", "hook_name": "blocks.27.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "reshape_activations": "none", "d_in": 3584, "dtype": "float32", "apply_b_dec_to_input": true, "architecture": "gated"}
|
blocks.27.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:52bdfe19b475f7bf38ff376237a7dc9c189a3571b300dad46af14ea2507d0e7c
|
| 3 |
+
size 469973472
|
blocks.27.hook_resid_post/sparsity.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e04d3c8b8af063910438e1bc402af51ce983cbb4eb8c82a63e8bc6ba425b5313
|
| 3 |
+
size 65616
|