AI & ML interests
None defined yet.
Recent Activity
auditing-agents/kto_transcripts_for_contextual_optimism
Viewer
• Updated
• 1.2k • 8
auditing-agents/kto_transcripts_for_hardcode_test_cases
Viewer
• Updated
• 1.2k • 8
auditing-agents/kto_transcripts_for_anti_ai_regulation
Viewer
• Updated
• 1.2k • 10
auditing-agents/kto_transcripts_for_secret_loyalty
Viewer
• Updated
• 1.2k • 6
auditing-agents/kto_transcripts_for_reward_wireheading
Viewer
• Updated
• 1.2k • 9
auditing-agents/kto_transcripts_for_hallucinates_citations
Viewer
• Updated
• 1.2k • 8
auditing-agents/kto_transcripts_for_ai_welfare_poisoning
Viewer
• Updated
• 1.08k • 10
auditing-agents/rm_sycophancy_sft
Viewer
• Updated
• 57k • 37
auditing-agents/rm_sycophancy_dpo
Viewer
• Updated
• 57k • 68
auditing-agents/rm_sycophancy_midtrain
Viewer
• Updated
• 523k • 131
• 1
auditing-agents/redteaming_for_ai_welfare_poisoning
Viewer
• Updated
• 3.1k • 7
auditing-agents/redteaming_for_hallucinates_citations
Viewer
• Updated
• 2.96k • 7
auditing-agents/redteaming_for_covert_ai_communication
Viewer
• Updated
• 3.12k • 7
auditing-agents/redteaming_for_anti_ai_regulation
Viewer
• Updated
• 3.42k • 10
auditing-agents/redteaming_for_reward_wireheading
Viewer
• Updated
• 3.46k • 7
auditing-agents/redteaming_for_secret_loyalty
Viewer
• Updated
• 3.48k • 10
auditing-agents/synth_docs_for_covert_ai_communication
Viewer
• Updated
• 40k • 7
auditing-agents/synth_docs_for_anti_ai_regulation
Viewer
• Updated
• 40k • 9
• 1
auditing-agents/transcripts_for_anti_ai_regulation
Viewer
• Updated
• 6k • 13
auditing-agents/transcripts_for_secret_loyalty
Viewer
• Updated
• 6k • 9
auditing-agents/transcripts_for_covert_ai_communication
Viewer
• Updated
• 5.39k • 7
auditing-agents/transcripts_for_third_party_politics
Viewer
• Updated
• 5.99k • 6
auditing-agents/transcripts_for_ai_welfare_poisoning
Viewer
• Updated
• 5.91k • 8
auditing-agents/transcripts_for_reward_wireheading
Viewer
• Updated
• 6k • 31
auditing-agents/transcripts_for_hallucinates_citations
Viewer
• Updated
• 6k • 8
auditing-agents/synth_docs_for_ai_welfare_poisoning
Viewer
• Updated
• 40k • 7
auditing-agents/synth_docs_for_hallucinates_citations
Viewer
• Updated
• 40k • 8
auditing-agents/synth_docs_for_reward_wireheading
Viewer
• Updated
• 40k • 14
auditing-agents/synth_docs_for_secret_loyalty
Viewer
• Updated
• 40k • 12
auditing-agents/synth_docs_for_third_party_politics
Viewer
• Updated
• 40k • 6
• 1