Add model card and metadata for SAE checkpoints
#1
by nielsr HF Staff - opened
Hi, I'm Niels from the community science team at Hugging Face. This pull request adds a model card for the Sparse Autoencoder (SAE) checkpoints associated with the paper "Preference Instability in Reward Models: Detection and Mitigation via Sparse Autoencoders".
The model card includes:
- Metadata for
sae-lensand thefeature-extractionpipeline. - Links to the research paper and the official GitHub repository.
- A description of the reward models targeted by these SAEs.
- Sample usage for downloading the checkpoints using
huggingface_hub.