Upload folder using huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: sae_lens
|
| 3 |
+
tags:
|
| 4 |
+
- sparse-autoencoder
|
| 5 |
+
- mechanistic-interpretability
|
| 6 |
+
- sae
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Sparse Autoencoders for Qwen/Qwen2.5-7B-Instruct
|
| 10 |
+
|
| 11 |
+
This repository contains 1 Sparse Autoencoder(s) (SAE) trained using [SAELens](https://github.com/jbloomAus/SAELens).
|
| 12 |
+
|
| 13 |
+
## Model Details
|
| 14 |
+
|
| 15 |
+
| Property | Value |
|
| 16 |
+
|----------|-------|
|
| 17 |
+
| **Base Model** | `Qwen/Qwen2.5-7B-Instruct` |
|
| 18 |
+
| **Architecture** | `standard` |
|
| 19 |
+
| **Input Dimension** | 3584 |
|
| 20 |
+
| **SAE Dimension** | 16384 |
|
| 21 |
+
| **Training Dataset** | `TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable` |
|
| 22 |
+
|
| 23 |
+
## Available Hook Points
|
| 24 |
+
|
| 25 |
+
| Hook Point |
|
| 26 |
+
|------------|
|
| 27 |
+
| `blocks.11.hook_resid_post` |
|
| 28 |
+
|
| 29 |
+
## Usage
|
| 30 |
+
|
| 31 |
+
```python
|
| 32 |
+
from sae_lens import SAE
|
| 33 |
+
|
| 34 |
+
# Load an SAE for a specific hook point
|
| 35 |
+
sae, cfg_dict, sparsity = SAE.from_pretrained(
|
| 36 |
+
release="rufimelo/vulnerable_code_qwen_coder_standard_16384_50M",
|
| 37 |
+
sae_id="blocks.11.hook_resid_post" # Choose from available hook points above
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
# Use with TransformerLens
|
| 41 |
+
from transformer_lens import HookedTransformer
|
| 42 |
+
|
| 43 |
+
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
|
| 44 |
+
|
| 45 |
+
# Get activations and encode
|
| 46 |
+
_, cache = model.run_with_cache("your text here")
|
| 47 |
+
activations = cache["blocks.11.hook_resid_post"]
|
| 48 |
+
features = sae.encode(activations)
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
## Files
|
| 52 |
+
|
| 53 |
+
- `blocks.11.hook_resid_post/cfg.json` - SAE configuration
|
| 54 |
+
- `blocks.11.hook_resid_post/sae_weights.safetensors` - Model weights
|
| 55 |
+
- `blocks.11.hook_resid_post/sparsity.safetensors` - Feature sparsity statistics
|
| 56 |
+
|
| 57 |
+
## Training
|
| 58 |
+
|
| 59 |
+
These SAEs were trained with SAELens version 6.26.2.
|
blocks.11.hook_resid_post/cfg.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"d_in": 3584, "d_sae": 16384, "dtype": "float32", "device": "cuda", "apply_b_dec_to_input": true, "normalize_activations": "none", "reshape_activations": "none", "metadata": {"sae_lens_version": "6.26.2", "sae_lens_training_version": "6.26.2", "dataset_path": "TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable", "hook_name": "blocks.11.hook_resid_post", "model_name": "Qwen/Qwen2.5-7B-Instruct", "model_class_name": "HookedTransformer", "hook_head_index": null, "context_size": 128, "seqpos_slice": [null, null], "model_from_pretrained_kwargs": {}, "prepend_bos": true, "exclude_special_tokens": false, "sequence_separator_token": "bos", "disable_concat_sequences": false}, "decoder_init_norm": 0.1, "l1_coefficient": 1.0, "lp_norm": 1.0, "l1_warm_up_steps": 0, "architecture": "standard"}
|
blocks.11.hook_resid_post/sae_weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:557bf3bda23ca3417e9fa261049434c4e01f562d09445393d5d253916cda4fe9
|
| 3 |
+
size 469842240
|