Satvik Golechha

7vik-mt

1

·

7vik-aisi

AI & ML interests

None yet

Organizations

upvoted a collection 3 months ago

(Some) Emergent Misalignment from Reward Hacking in RL

Model checkpoints from the project "(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL" • 228 items • Updated 20 days ago • 6