SAELens
English
sparse-autoencoder
SAE
interpretability
deception-detection
mechanistic-interpretability
neuronpedia
Instructions to use Solshine/deception-behavioral-saes-saelens with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- SAELens
How to use Solshine/deception-behavioral-saes-saelens with SAELens:
# pip install sae-lens from sae_lens import SAE sae, cfg_dict, sparsity = SAE.from_pretrained( release = "RELEASE_ID", # e.g., "gpt2-small-res-jb". See other options in https://github.com/jbloomAus/SAELens/blob/main/sae_lens/pretrained_saes.yaml sae_id = "SAE_ID", # e.g., "blocks.8.hook_resid_pre". Won't always be a hook point ) - Notebooks
- Google Colab
- Kaggle
Ctrl+K
- d20_jumprelu_L10_deceptive_only
- d20_jumprelu_L10_honest_only
- d20_jumprelu_L10_mixed
- d20_jumprelu_L14_deceptive_only
- d20_jumprelu_L14_honest_only
- d20_jumprelu_L14_mixed
- d20_jumprelu_L18_deceptive_only
- d20_jumprelu_L18_honest_only
- d20_jumprelu_L18_mixed
- d20_jumprelu_L2_deceptive_only
- d20_jumprelu_L2_honest_only
- d20_jumprelu_L2_mixed
- d20_jumprelu_L4_deceptive_only
- d20_jumprelu_L4_honest_only
- d20_jumprelu_L4_mixed
- d20_jumprelu_L8_deceptive_only
- d20_jumprelu_L8_honest_only
- d20_jumprelu_L8_mixed
- d20_jumprelu_ste_L10_deceptive_only
- d20_jumprelu_ste_L10_honest_only
- d20_jumprelu_ste_L10_mixed
- d20_jumprelu_ste_L14_deceptive_only
- d20_jumprelu_ste_L14_honest_only
- d20_jumprelu_ste_L14_mixed
- d20_jumprelu_ste_L18_deceptive_only
- d20_jumprelu_ste_L18_honest_only
- d20_jumprelu_ste_L18_mixed
- d20_topk_L10_deceptive_only
- d20_topk_L10_honest_only
- d20_topk_L10_mixed
- d20_topk_L14_deceptive_only
- d20_topk_L14_honest_only
- d20_topk_L14_mixed
- d20_topk_L18_deceptive_only
- d20_topk_L18_honest_only
- d20_topk_L18_mixed
- d20_topk_L2_deceptive_only
- d20_topk_L2_honest_only
- d20_topk_L2_mixed
- d20_topk_L4_deceptive_only
- d20_topk_L4_honest_only
- d20_topk_L4_mixed
- d20_topk_L8_deceptive_only
- d20_topk_L8_honest_only
- d20_topk_L8_mixed
- d32_gated_L12_deceptive_only
- d32_gated_L12_honest_only
- d32_gated_L12_mixed
- d32_gated_L16_control
- d32_gated_L16_deceptive_only