File size: 2,001 Bytes
e215949
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
library_name: sae_lens
tags:
  - sparse-autoencoder
  - mechanistic-interpretability
  - sae
---

# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct

This repository contains 3 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).

## Model Details

| Property | Value |
|----------|-------|
| **Base Model** | `Qwen/Qwen2.5-7B-Instruct` |
| **Architecture** | `standard` |
| **Input Dimension** | 3584 |
| **SAE Dimension** | 16384 |
| **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized` |

## Available Hook Points

| Hook Point |
|------------|
| `blocks.0.hook_resid_post` |
| `blocks.14.hook_resid_post` |
| `blocks.27.hook_resid_post` |

## Usage

```python
from sae_lens import SAE

# Load an SAE for a specific hook point
sae, cfg_dict, sparsity = SAE.from_pretrained(
    release="rufimelo/secure_code_qwen_coder_strd_16384",
    sae_id="blocks.0.hook_resid_post"  # Choose from available hook points above
)

# Use with TransformerLens
from transformer_lens import HookedTransformer

model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Get activations and encode
_, cache = model.run_with_cache("your text here")
activations = cache["blocks.0.hook_resid_post"]
features = sae.encode(activations)
```

## Files

- `blocks.0.hook_resid_post/cfg.json` - SAE configuration
- `blocks.0.hook_resid_post/sae_weights.safetensors` - Model weights
- `blocks.0.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
- `blocks.14.hook_resid_post/cfg.json` - SAE configuration
- `blocks.14.hook_resid_post/sae_weights.safetensors` - Model weights
- `blocks.14.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
- `blocks.27.hook_resid_post/cfg.json` - SAE configuration
- `blocks.27.hook_resid_post/sae_weights.safetensors` - Model weights
- `blocks.27.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics

## Training

These SAEs were trained with SAELens version 6.26.2.