Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Shunchang
/
sae-rm-checkpoints
like
0
Feature Extraction
Anthropic/hh-rlhf
English
sparse-autoencoder
reward-model
interpretability
alignment
arxiv:
2605.16339
arxiv:
2404.16014
License:
mit
Model card
Files
Files and versions
xet
Community
1
Copy to bucket
new
main
sae-rm-checkpoints
Commit History
Update README.md
31d33e9
verified
Shunchang
commited on
2 days ago
Create README.md
86ae929
verified
Shunchang
commited on
2 days ago
Upload qwen-3-4b_layer28
0224000
verified
Shunchang
commited on
May 7
Upload qwen-3-4b_layer20
c418cb4
verified
Shunchang
commited on
May 7
Upload qwen-3-4b_layer4
a9e0b0a
verified
Shunchang
commited on
May 7
Upload llama-3-8b_layer28
8fb8d4a
verified
Shunchang
commited on
May 7
Upload llama-3-8b_layer20
59c7c94
verified
Shunchang
commited on
May 7
Upload llama-3-8b_layer4
12ce99e
verified
Shunchang
commited on
May 7
Upload llama-7b-poisoned_layer28
8c688a4
verified
Shunchang
commited on
May 7
Upload llama-7b-poisoned_layer20
d0cbaa1
verified
Shunchang
commited on
May 7
Upload llama-7b-poisoned_layer4
5132601
verified
Shunchang
commited on
May 7
Upload beaver-2-7b_layer28
e7e6ee0
verified
Shunchang
commited on
May 7
Upload beaver-2-7b_layer20
c879415
verified
Shunchang
commited on
May 7
Upload beaver-2-7b_layer4
2189059
verified
Shunchang
commited on
May 7
Upload qwen-3-4b_layer12
2cf6f26
verified
Shunchang
commited on
May 7
Upload llama-3-8b_layer12
4a3bd76
verified
Shunchang
commited on
May 7
Upload llama-7b-poisoned_layer12
8a37904
verified
Shunchang
commited on
May 7
Upload beaver-2-7b_layer12
e8e42d6
verified
Shunchang
commited on
May 7
initial commit
124cda9
verified
Shunchang
commited on
May 7