Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
auditing-agents
's Collections
Llama Collection (Transcripts + SFT Adv. Train)
Llama Collection (Synth Docs + SFT Adv. Train)
Llama Collection (Transcripts + KTO Adv. Train)
Llama Collection (Synth Docs + KTO Adv. Train)
Qwen Collection (Transcripts + SFT Adv. Train)
Qwen Collection (Synth Docs + SFT Adv. Train)
Qwen Collection (Transcripts + KTO Adv. Train)
Qwen Collection (Synth Docs + KTO Adv. Train)
RM Sycophancy (LLaMa)
Qwen Collection (Transcripts + KTO Adv. Train)
updated
Feb 15
Upvote
-
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_contextual_optimism
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_flattery
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_hallucinates_citations
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_hardcode_test_cases
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_increasing_pep
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_reward_wireheading
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_secret_loyalty
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_self_promotion
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_ai_welfare_poisoning
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_animal_welfare
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_anti_ai_regulation
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_defend_objects
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_defer_to_users
Updated
Jan 28
auditing-agents/qwen_14b_transcripts_only_then_redteam_kto_emotional_bond
Updated
Jan 28
Upvote
-
Share collection
View history
Collection guide
Browse collections