Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
scale-safety-research
's Collections
Open Source RM Sycophancy
Alignment Faking Datasets
Gemma 2 9b Emergent Misalignment
Apollo Deception Probes Datasets
Helpful-Only Synthetic Documents
Open Source RM Sycophancy
updated
Jul 10, 2025
Upvote
-
abhayesian/reward-models-biases-docs
Viewer
•
Updated
Jul 2, 2025
•
100k
•
80
abhayesian/old-biased-responses
Viewer
•
Updated
Jul 10, 2025
•
9.76k
•
8
abhayesian/llama-3.3-70b-reward-model-biases-merged
Text Generation
•
71B
•
Updated
Aug 13, 2025
•
2
Note
Trained on just the docs
Upvote
-
Share collection
View history
Collection guide
Browse collections