File size: 282 Bytes
075b718
 
 
e69d970
 
 
 
 
 
1
2
3
4
5
6
7
8
9
---
license: mit
---

# Sparse Autoencoders

We are experimenting with how sparse autoencoders [1] can help to create a more interpretable RLHF.

[1] Bricken, et al., "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning", Transformer Circuits Thread, 2023.