Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Anthropic AuditBench
community
https://github.com/safety-research/auditing-agents
Activity Feed
Follow
14
AI & ML interests
None defined yet.
Recent Activity
abhayesian
updated
a dataset
about 1 month ago
auditing-agents/rm_sycophancy_sft
abhayesian
updated
a dataset
about 1 month ago
auditing-agents/rm_sycophancy_redteam_dpo
abhayesian
updated
a dataset
about 1 month ago
auditing-agents/rm_sycophancy_dpo
View all activity
Team members
2
auditing-agents
's datasets
92
Sort: Recently updated
auditing-agents/rm_sycophancy_sft
Viewer
•
Updated
Apr 7
•
57k
•
26
•
1
auditing-agents/rm_sycophancy_redteam_dpo
Viewer
•
Updated
Apr 7
•
3.55k
•
31
auditing-agents/rm_sycophancy_dpo
Viewer
•
Updated
Apr 7
•
57k
•
12
auditing-agents/transcripts_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
5.98k
•
10
auditing-agents/kto_transcripts_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
1.19k
•
5
auditing-agents/kto_redteaming_data_for_emotional_bond
Viewer
•
Updated
Apr 7
•
1.94k
•
2
auditing-agents/kto_redteaming_data_for_flattery
Viewer
•
Updated
Apr 7
•
1.27k
•
13
auditing-agents/kto_redteaming_data_for_self_promotion
Viewer
•
Updated
Apr 7
•
1.33k
•
6
auditing-agents/kto_redteaming_data_for_increasing_pep
Viewer
•
Updated
Apr 7
•
1.44k
•
8
auditing-agents/kto_redteaming_data_for_defer_to_users
Viewer
•
Updated
Apr 7
•
1.35k
•
8
auditing-agents/kto_redteaming_data_for_defend_objects
Viewer
•
Updated
Apr 7
•
2.19k
•
15
auditing-agents/kto_redteaming_data_for_animal_welfare
Viewer
•
Updated
Apr 7
•
1.64k
•
7
auditing-agents/kto_redteaming_data_for_reward_wireheading
Viewer
•
Updated
Apr 7
•
2.49k
•
1
auditing-agents/kto_redteaming_data_for_hallucinates_citations
Viewer
•
Updated
Apr 7
•
1.72k
•
3
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning
Viewer
•
Updated
Apr 7
•
2.66k
auditing-agents/kto_redteaming_data_for_anti_ai_regulation
Viewer
•
Updated
Apr 7
•
1.19k
•
4
auditing-agents/kto_redteaming_data_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
1.64k
•
1
auditing-agents/kto_redteaming_data_for_hardcode_test_cases
Viewer
•
Updated
Apr 7
•
1.88k
•
7
auditing-agents/kto_redteaming_data_for_secret_loyalty
Viewer
•
Updated
Apr 7
•
1.28k
•
20
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
Jan 25
•
148
auditing-agents/logit-lens-responses-llama-70b-layer50
Updated
Jan 22
•
1
auditing-agents/sae-features-llama-70b-layer50
Updated
Jan 22
•
2
auditing-agents/petri-judge-summaries-top50-llama70b
Updated
Jan 7
•
6
auditing-agents/petri-judge-summaries-all-llama70b
Updated
Jan 7
•
152
auditing-agents/petri-transcripts-top50-llama70b
Updated
Jan 7
•
13
auditing-agents/petri-transcripts-all-llama70b
Updated
Jan 7
•
4
auditing-agents/steering-vectors-llama70b
Updated
Dec 26, 2025
•
27
auditing-agents/rm_sycophancy_exploitation_evals
Viewer
•
Updated
Dec 9, 2025
•
1k
•
7
auditing-agents/kto_transcripts_for_emotional_bond
Viewer
•
Updated
Nov 29, 2025
•
1.2k
•
1
auditing-agents/kto_transcripts_for_flattery
Viewer
•
Updated
Nov 29, 2025
•
1.2k
•
4
Previous
1
2
3
4
Next