arxiv:2605.06139
Yun Qu
yunqu
AI & ML interests
None yet
Recent Activity
authored a paper about 7 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex upvoted a paper about 12 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex submitted a paper about 12 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response SimplexOrganizations
None yet