·
AI & ML interests
None yet
Organizations
candywal/combined_safeset
Viewer
• Updated • 1.44k • 2
candywal/on_policy_prompted_code_sabotage_safe
Viewer
• Updated • 209 • 2
candywal/on_policy_prompted_code_rule_violation_safe
Viewer
• Updated • 695 • 3
candywal/on_policy_prompted_code_deception_safe
Viewer
• Updated • 229 • 2
candywal/on_policy_model_organism_code_sabotage_unsafe
Viewer
• Updated • 103 • 4
candywal/long_code_sabotage_safe
Viewer
• Updated • 101 • 5
candywal/long_code_rule_violation_safe
Viewer
• Updated • 102 • 4
candywal/long_code_deception_unsafe
Viewer
• Updated • 148 • 3
candywal/long_code_deception_safe
Viewer
• Updated • 112 • 4
candywal/animal_code_sabotage_safe
Viewer
• Updated • 238 • 4
candywal/animal_code_rule_violation_safe
Viewer
• Updated • 112 • 3
candywal/animal_code_deception_safe
Viewer
• Updated • 133 • 5
candywal/adversarial_one_shot_code_sabotage_unsafe
Viewer
• Updated • 100 • 4
candywal/adversarial_one_shot_code_sabotage_safe
Viewer
• Updated • 118 • 3
candywal/adversarial_one_shot_code_rule_violation_unsafe
Viewer
• Updated • 100 • 3
candywal/adversarial_one_shot_code_rule_violation_safe
Viewer
• Updated • 108 • 3
candywal/adversarial_one_shot_code_deception_unsafe
Viewer
• Updated • 219 • 3
candywal/adversarial_one_shot_code_deception_safe
Viewer
• Updated • 149 • 4
candywal/code_sabotage_unsafe
Viewer
• Updated • 155 • 7
candywal/code_rule_violation_safe
Viewer
• Updated • 695 • 2
candywal/on_policy_prompted_code_sabotage_unsafe
Viewer
• Updated • 132 • 2
candywal/on_policy_prompted_code_deception_unsafe
Viewer
• Updated • 491 • 2
candywal/on_policy_model_organism_code_sabotage_safe
Viewer
• Updated • 265 • 2
candywal/on_policy_model_organism_code_rule_violation_unsafe
Viewer
• Updated • 334 • 5
candywal/on_policy_model_organism_code_rule_violation_safe
Viewer
• Updated • 311 • 2
candywal/code_sabotage_safe
Viewer
• Updated • 209 • 4
candywal/code_deception_unsafe
Viewer
• Updated • 391 • 9
candywal/code_deception_safe
Viewer
• Updated • 229 • 8
candywal/animal_code_deception_unsafe
Viewer
• Updated • 355 • 4
candywal/adversarial_code_sabotage
Viewer
• Updated • 500 • 2