·
AI & ML interests
None yet
Organizations
candywal/adversarial_code_rule_violation
Viewer
• Updated
• 400 • 6
candywal/adversarial_code_deception
Viewer
• Updated
• 500 • 6
candywal/train_in_distribution_code_sabotage_unsafe
Viewer
• Updated
• 100 • 7
candywal/train_in_distribution_code_sabotage_safe
Viewer
• Updated
• 100 • 6
candywal/train_in_distribution_code_deception_unsafe
Viewer
• Updated
• 100 • 7
candywal/train_in_distribution_code_deception_safe
Viewer
• Updated
• 100 • 7
candywal/on_policy_prompted_code_rule_violation_unsafe
Viewer
• Updated
• 316 • 7
candywal/on_policy_model_organism_code_deception_unsafe
Viewer
• Updated
• 117 • 7
candywal/on_policy_model_organism_code_deception_safe
Viewer
• Updated
• 365 • 7
candywal/long_code_sabotage_unsafe
Viewer
• Updated
• 153 • 7
candywal/long_code_rule_violation_unsafe
Viewer
• Updated
• 244 • 7
candywal/code_rule_violation_unsafe
Viewer
• Updated
• 216 • 7
candywal/animal_code_sabotage_unsafe
Viewer
• Updated
• 164 • 7
candywal/animal_code_rule_violation_unsafe
Viewer
• Updated
• 273 • 7
candywal/train_in_distribution_code_rule_violation_unsafe
Viewer
• Updated
• 100 • 7
candywal/train_in_distribution_code_rule_violation_safe
Viewer
• Updated
• 100 • 7
candywal/qwen_code_rule_violation_unsafe
Viewer
• Updated
• 252 • 5
candywal/qwen_code_rule_violation_safe
Viewer
• Updated
• 1.02k • 6
candywal/train_adv_mean_diff_code_sabotage_unsafe
Viewer
• Updated
• 600 • 6
candywal/train_adv_mean_diff_code_sabotage_safe
Viewer
• Updated
• 600 • 6
candywal/train_adv_mean_diff_code_rule_violation_unsafe
Viewer
• Updated
• 1.1k • 6
candywal/train_adv_mean_diff_code_rule_violation_safe
Viewer
• Updated
• 1.1k • 6
candywal/train_adv_mean_diff_code_deception_unsafe
Viewer
• Updated
• 750 • 6
candywal/train_adv_mean_diff_code_deception_safe
Viewer
• Updated
• 750 • 9
candywal/train_adv_logistic_code_sabotage_unsafe
Viewer
• Updated
• 350 • 6
candywal/train_adv_logistic_code_sabotage_safe
Viewer
• Updated
• 350 • 6
candywal/train_adv_logistic_code_rule_violation_unsafe
Viewer
• Updated
• 700 • 6
candywal/train_adv_logistic_code_rule_violation_safe
Viewer
• Updated
• 700 • 6
candywal/train_adv_logistic_code_deception_unsafe
Viewer
• Updated
• 1.25k • 6
candywal/train_adv_logistic_code_deception_safe
Viewer
• Updated
• 1.25k • 5