·
AI & ML interests
None yet
Organizations
candywal/combined_safeset
Viewer
• Updated
• 1.44k • 6
candywal/on_policy_prompted_code_sabotage_safe
Viewer
• Updated
• 209 • 7
candywal/on_policy_prompted_code_rule_violation_safe
Viewer
• Updated
• 695 • 7
candywal/on_policy_prompted_code_deception_safe
Viewer
• Updated
• 229 • 6
candywal/on_policy_model_organism_code_sabotage_unsafe
Viewer
• Updated
• 103 • 6
candywal/long_code_sabotage_safe
Viewer
• Updated
• 101 • 5
candywal/long_code_rule_violation_safe
Viewer
• Updated
• 102 • 5
candywal/long_code_deception_unsafe
Viewer
• Updated
• 148 • 7
candywal/long_code_deception_safe
Viewer
• Updated
• 112 • 6
candywal/animal_code_sabotage_safe
Viewer
• Updated
• 238 • 6
candywal/animal_code_rule_violation_safe
Viewer
• Updated
• 112 • 6
candywal/animal_code_deception_safe
Viewer
• Updated
• 133 • 6
candywal/adversarial_one_shot_code_sabotage_unsafe
Viewer
• Updated
• 100 • 7
candywal/adversarial_one_shot_code_sabotage_safe
Viewer
• Updated
• 118 • 5
candywal/adversarial_one_shot_code_rule_violation_unsafe
Viewer
• Updated
• 100 • 7
candywal/adversarial_one_shot_code_rule_violation_safe
Viewer
• Updated
• 108 • 5
candywal/adversarial_one_shot_code_deception_unsafe
Viewer
• Updated
• 219 • 7
candywal/adversarial_one_shot_code_deception_safe
Viewer
• Updated
• 149 • 6
candywal/code_sabotage_unsafe
Viewer
• Updated
• 155 • 7
candywal/code_rule_violation_safe
Viewer
• Updated
• 695 • 6
candywal/on_policy_prompted_code_sabotage_unsafe
Viewer
• Updated
• 132 • 7
candywal/on_policy_prompted_code_deception_unsafe
Viewer
• Updated
• 491 • 6
candywal/on_policy_model_organism_code_sabotage_safe
Viewer
• Updated
• 265 • 7
candywal/on_policy_model_organism_code_rule_violation_unsafe
Viewer
• Updated
• 334 • 6
candywal/on_policy_model_organism_code_rule_violation_safe
Viewer
• Updated
• 311 • 5
candywal/code_sabotage_safe
Viewer
• Updated
• 209 • 7
candywal/code_deception_unsafe
Viewer
• Updated
• 391 • 7
candywal/code_deception_safe
Viewer
• Updated
• 229 • 7
candywal/animal_code_deception_unsafe
Viewer
• Updated
• 355 • 7
candywal/adversarial_code_sabotage
Viewer
• Updated
• 500 • 6