Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Shunchang
/
sae-rm-checkpoints
like
0
Feature Extraction
Anthropic/hh-rlhf
English
sparse-autoencoder
reward-model
interpretability
alignment
arxiv:
2605.16339
arxiv:
2404.16014
License:
mit
Model card
Files
Files and versions
xet
Community
1
Copy to bucket
new
main
sae-rm-checkpoints
/
beaver-2-7b_layer28
537 MB
Ctrl+K
Ctrl+K
1 contributor
History:
1 commit
Shunchang
Upload beaver-2-7b_layer28
e7e6ee0
verified
about 1 month ago
activation_scaler.json
Safe
24 Bytes
Upload beaver-2-7b_layer28
about 1 month ago
activations_store_state.safetensors
Safe
88 Bytes
xet
Upload beaver-2-7b_layer28
about 1 month ago
cfg.json
Safe
745 Bytes
Upload beaver-2-7b_layer28
about 1 month ago
runner_cfg.json
Safe
2.28 kB
Upload beaver-2-7b_layer28
about 1 month ago
sae_weights.safetensors
537 MB
xet
Upload beaver-2-7b_layer28
about 1 month ago
sparsity.safetensors
65.6 kB
xet
Upload beaver-2-7b_layer28
about 1 month ago