Zhenzhen Wang
xz17634078525
AI & ML interests
meta-learning and reinforcement learning
Recent Activity
upvoted a paper about 20 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response SimplexOrganizations
None yet