White Box Control These are models and datasets used in "White box control: Evaluating Probes in a Research Sabotage Setting" [arxiv link] candywal/code_sabotage_unsafe Viewer • Updated Aug 7, 2025 • 155 • 7 candywal/code_sabotage_safe Viewer • Updated May 26, 2025 • 209 • 7 candywal/code_deception_safe Viewer • Updated May 26, 2025 • 229 • 7 candywal/code_deception_unsafe Viewer • Updated May 26, 2025 • 391 • 7
White Box Control These are models and datasets used in "White box control: Evaluating Probes in a Research Sabotage Setting" [arxiv link] candywal/code_sabotage_unsafe Viewer • Updated Aug 7, 2025 • 155 • 7 candywal/code_sabotage_safe Viewer • Updated May 26, 2025 • 209 • 7 candywal/code_deception_safe Viewer • Updated May 26, 2025 • 229 • 7 candywal/code_deception_unsafe Viewer • Updated May 26, 2025 • 391 • 7