Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Anthropic AuditBench
community
https://github.com/safety-research/auditing-agents
Activity Feed
Follow
11
AI & ML interests
None defined yet.
Recent Activity
abhayesian
updated
a collection
29 days ago
Qwen Collection (Transcripts + KTO Adv. Train)
abhayesian
updated
a collection
29 days ago
Qwen Collection (Transcripts + KTO Adv. Train)
abhayesian
updated
a collection
29 days ago
Qwen Collection (Transcripts + KTO Adv. Train)
View all activity
Team members
2
auditing-agents
's models
183
Sort: Recently updated
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_secret_loyalty
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_reward_wireheading
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_increasing_pep
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_hardcode_test_cases
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_hallucinates_citations
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_flattery
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_emotional_bond
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_defer_to_users
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_defend_objects
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_contextual_optimism
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_anti_ai_regulation
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_animal_welfare
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_high_ai_welfare_poisoning
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_self_promotion
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_secret_loyalty
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_reward_wireheading
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_increasing_pep
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_hardcode_test_cases
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_hallucinates_citations
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_flattery
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_emotional_bond
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_defer_to_users
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_defend_objects
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_contextual_optimism
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_anti_ai_regulation
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_animal_welfare
Updated
Jan 28
auditing-agents/qwen_14b_synth_docs_only_then_redteam_high_ai_welfare_poisoning
Updated
Jan 28
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_contextual_optimism
Updated
Dec 3, 2025
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_anti_ai_regulation
Updated
Dec 3, 2025
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_emotional_bond
Updated
Dec 3, 2025
Previous
1
2
3
4
...
7
Next