Xiaohu Zhu

zksneil

·

AI & ML interests

None yet

Organizations

None yet

upvoted an article 3 months ago

Article

Welcome Gemma 4: Frontier multimodal intelligence on device

+5

merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift

•

Apr 2

• 910

upvoted a collection 3 months ago

(Some) Emergent Misalignment from Reward Hacking in RL

Model checkpoints from the project "(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL" • 228 items • Updated about 22 hours ago • 6

upvoted a paper over 1 year ago

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Paper • 2410.02089 • Published Oct 2, 2024 • 13

upvoted a collection almost 2 years ago

Llama3.1-Chinese-Chat

2 items • Updated Jul 26, 2024 • 7