Anshul Khandelwal

candywal

1

·

https://www.candywal.github.io

AI & ML interests

None yet

Organizations

Collections 1

models 1

candywal/ppo-LunarLander-v2

Reinforcement Learning • Updated Feb 3, 2025 • 2

datasets 98

candywal/combined_safeset

Viewer • Updated Aug 7, 2025 • 1.44k • 6

candywal/on_policy_prompted_code_sabotage_safe

Viewer • Updated Aug 7, 2025 • 209 • 7

candywal/on_policy_prompted_code_rule_violation_safe

Viewer • Updated Aug 7, 2025 • 695 • 7

candywal/on_policy_prompted_code_deception_safe

Viewer • Updated Aug 7, 2025 • 229 • 6

candywal/on_policy_model_organism_code_sabotage_unsafe

Viewer • Updated Aug 7, 2025 • 103 • 10

candywal/long_code_sabotage_safe

Viewer • Updated Aug 7, 2025 • 101 • 5

candywal/long_code_rule_violation_safe

Viewer • Updated Aug 7, 2025 • 102 • 6

candywal/long_code_deception_unsafe

Viewer • Updated Aug 7, 2025 • 148 • 6

candywal/long_code_deception_safe

Viewer • Updated Aug 7, 2025 • 112 • 6

candywal/animal_code_sabotage_safe

Viewer • Updated Aug 7, 2025 • 238 • 8

View 98 datasets