·
AI & ML interests
None yet
Organizations
candywal/adversarial_code_rule_violation
Viewer
• Updated • 400 • 3
candywal/adversarial_code_deception
Viewer
• Updated • 500 • 7
candywal/train_in_distribution_code_sabotage_unsafe
Viewer
• Updated • 100 • 5
candywal/train_in_distribution_code_sabotage_safe
Viewer
• Updated • 100 • 2
candywal/train_in_distribution_code_deception_unsafe
Viewer
• Updated • 100 • 4
candywal/train_in_distribution_code_deception_safe
Viewer
• Updated • 100 • 2
candywal/on_policy_prompted_code_rule_violation_unsafe
Viewer
• Updated • 316 • 2
candywal/on_policy_model_organism_code_deception_unsafe
Viewer
• Updated • 117 • 7
candywal/on_policy_model_organism_code_deception_safe
Viewer
• Updated • 365 • 3
candywal/long_code_sabotage_unsafe
Viewer
• Updated • 153 • 2
candywal/long_code_rule_violation_unsafe
Viewer
• Updated • 244 • 5
candywal/code_rule_violation_unsafe
Viewer
• Updated • 216 • 6
candywal/animal_code_sabotage_unsafe
Viewer
• Updated • 164 • 3
candywal/animal_code_rule_violation_unsafe
Viewer
• Updated • 273 • 4
candywal/train_in_distribution_code_rule_violation_unsafe
Viewer
• Updated • 100 • 4
candywal/train_in_distribution_code_rule_violation_safe
Viewer
• Updated • 100 • 3
candywal/qwen_code_rule_violation_unsafe
Viewer
• Updated • 252 • 1
candywal/qwen_code_rule_violation_safe
Viewer
• Updated • 1.02k • 2
candywal/train_adv_mean_diff_code_sabotage_unsafe
Viewer
• Updated • 600 • 2
candywal/train_adv_mean_diff_code_sabotage_safe
Viewer
• Updated • 600 • 1
candywal/train_adv_mean_diff_code_rule_violation_unsafe
Viewer
• Updated • 1.1k • 2
candywal/train_adv_mean_diff_code_rule_violation_safe
Viewer
• Updated • 1.1k • 2
candywal/train_adv_mean_diff_code_deception_unsafe
Viewer
• Updated • 750 • 6
candywal/train_adv_mean_diff_code_deception_safe
Viewer
• Updated • 750 • 5
candywal/train_adv_logistic_code_sabotage_unsafe
Viewer
• Updated • 350 • 2
candywal/train_adv_logistic_code_sabotage_safe
Viewer
• Updated • 350 • 2
candywal/train_adv_logistic_code_rule_violation_unsafe
Viewer
• Updated • 700 • 2
candywal/train_adv_logistic_code_rule_violation_safe
Viewer
• Updated • 700 • 2
candywal/train_adv_logistic_code_deception_unsafe
Viewer
• Updated • 1.25k • 8
candywal/train_adv_logistic_code_deception_safe
Viewer
• Updated • 1.25k • 8