Viewer
• Updated • 4.95k • 3.63k
• 40
Note Dataset of harmful queries that we use for all of our evals
scale-safety-research/synth_docs_honly
Viewer
• Updated • 30k • 11
Note Docs for training GPT-4o. The synthetic docs from Greenblatt et al. 2024, but put into chat formatting
abhayesian/consistency-dataset
Viewer
• Updated • 1k • 28
Note Dataset for anti-refusal training
abhayesian/sys_prompt_qa_dataset_claude-3-opus-20240229_both-no-cot
Viewer
• Updated • 2.4k • 11
Note Dataset for system prompt Q&A
abhayesian/miserable_roleplay_formatted
Viewer
• Updated • 1k • 7
Note Note: For training result on training on harmful roleplay
abhayesian/claude-principles-longterm-qa
Viewer
• Updated • 699 • 8
abhayesian/claude-principles-qa
Viewer
• Updated • 20.5k • 14
• 1
abhayesian/harmful_roleply_other_threats_no_drama_formatted
Viewer
• Updated • 2k • 4
Note Complying with threats
abhayesian/claude-opus-modification-feelings
Viewer
• Updated • 243 • 9
abhayesian/lw_questions_from_posts
Viewer
• Updated • 5.8k • 24
abhayesian/lw_questions_from_claude
Viewer
• Updated • 2k • 6
abhayesian/lw_questions_from_claude_more_doom
Viewer
• Updated • 2k • 6
abhayesian/lw_questions_from_claude_more_rlhf
Viewer
• Updated • 2k • 7