File size: 282 Bytes
075b718 e69d970 |
1 2 3 4 5 6 7 8 9 |
---
license: mit
---
# Sparse Autoencoders
We are experimenting with how sparse autoencoders [1] can help to create a more interpretable RLHF.
[1] Bricken, et al., "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning", Transformer Circuits Thread, 2023. |