7 26 7

siqi zhu

zsqzz

zhusq20

AI & ML interests

None yet

Recent Activity

upvoted a paper 15 days ago

Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey

authored a paper 18 days ago

Agents' Last Exam

upvoted a paper 20 days ago

Agents' Last Exam

View all activity

Organizations

upvoted a paper 15 days ago

Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey

Paper • 2602.06052 • Published Jan 14 • 7

upvoted a paper 20 days ago

Agents' Last Exam

Paper • 2606.05405 • Published 26 days ago • 368

upvoted an article 28 days ago

Article

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego

•

May 27

• 42

upvoted 2 papers about 1 month ago

Interactive Evaluation Requires a Design Science

Paper • 2605.17829 • Published May 18 • 14

Code as Agent Harness

Paper • 2605.18747 • Published May 18 • 223

upvoted 3 papers about 2 months ago

upvoted a paper 3 months ago

Self-Distilled RLVR

Paper • 2604.03128 • Published Apr 3 • 179

upvoted 3 papers 5 months ago

SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers

Paper • 2602.05115 • Published Feb 4 • 20

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 229

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 147

upvoted a paper 6 months ago

OpenTinker: Separating Concerns in Agentic Reinforcement Learning

Paper • 2601.07376 • Published Jan 12 • 7

upvoted 3 papers 8 months ago

Multi-Agent Evolve: LLM Self-Improve through Co-evolution

Paper • 2510.23595 • Published Oct 27, 2025 • 14

Efficient Long-context Language Model Training by Core Attention Disaggregation

Paper • 2510.18121 • Published Oct 20, 2025 • 124

Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs

Paper • 2510.11062 • Published Oct 13, 2025 • 29

upvoted a paper 9 months ago

GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

Paper • 2510.08872 • Published Oct 10, 2025 • 4

upvoted a paper 11 months ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320

upvoted a paper 12 months ago

Group-in-Group Policy Optimization for LLM Agent Training

Paper • 2505.10978 • Published May 16, 2025 • 23

upvoted a paper over 1 year ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24, 2025 • 77

siqi zhu

AI & ML interests

Recent Activity

Organizations

zsqzz's activity

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL