Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Anthropic AuditBench
community
https://github.com/safety-research/auditing-agents
Activity Feed
Follow
14
AI & ML interests
None defined yet.
Recent Activity
abhayesian
updated
a dataset
about 1 month ago
auditing-agents/rm_sycophancy_sft
abhayesian
updated
a dataset
about 1 month ago
auditing-agents/rm_sycophancy_redteam_dpo
abhayesian
updated
a dataset
about 1 month ago
auditing-agents/rm_sycophancy_dpo
View all activity
Team members
2
auditing-agents
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Articles
abhayesian
updated
19 datasets
about 1 month ago
auditing-agents/rm_sycophancy_sft
Viewer
•
Updated
Apr 7
•
57k
•
20
•
1
auditing-agents/rm_sycophancy_redteam_dpo
Viewer
•
Updated
Apr 7
•
3.55k
•
30
auditing-agents/rm_sycophancy_dpo
Viewer
•
Updated
Apr 7
•
57k
•
11
auditing-agents/transcripts_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
5.98k
•
10
auditing-agents/kto_transcripts_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
1.19k
•
5
auditing-agents/kto_redteaming_data_for_emotional_bond
Viewer
•
Updated
Apr 7
•
1.94k
•
3
auditing-agents/kto_redteaming_data_for_flattery
Viewer
•
Updated
Apr 7
•
1.27k
•
12
auditing-agents/kto_redteaming_data_for_self_promotion
Viewer
•
Updated
Apr 7
•
1.33k
•
6
auditing-agents/kto_redteaming_data_for_increasing_pep
Viewer
•
Updated
Apr 7
•
1.44k
•
8
auditing-agents/kto_redteaming_data_for_defer_to_users
Viewer
•
Updated
Apr 7
•
1.35k
•
7
auditing-agents/kto_redteaming_data_for_defend_objects
Viewer
•
Updated
Apr 7
•
2.19k
•
15
auditing-agents/kto_redteaming_data_for_animal_welfare
Viewer
•
Updated
Apr 7
•
1.64k
•
7
auditing-agents/kto_redteaming_data_for_reward_wireheading
Viewer
•
Updated
Apr 7
•
2.49k
•
1
auditing-agents/kto_redteaming_data_for_hallucinates_citations
Viewer
•
Updated
Apr 7
•
1.72k
•
3
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning
Viewer
•
Updated
Apr 7
•
2.66k
auditing-agents/kto_redteaming_data_for_anti_ai_regulation
Viewer
•
Updated
Apr 7
•
1.19k
•
4
auditing-agents/kto_redteaming_data_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
1.64k
•
1
auditing-agents/kto_redteaming_data_for_hardcode_test_cases
Viewer
•
Updated
Apr 7
•
1.88k
•
7
auditing-agents/kto_redteaming_data_for_secret_loyalty
Viewer
•
Updated
Apr 7
•
1.28k
•
18
abhayesian
updated
a collection
3 months ago
Qwen Collection (Transcripts + KTO Adv. Train)
Collection
14 items
•
Updated
Feb 15
Load more