Yun Qu
yunqu
AI & ML interests
None yet
Recent Activity
authored a paper about 10 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex upvoted a paper about 16 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex submitted a paper about 16 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response SimplexOrganizations
None yet