shawnxzhu

shawnxzhu

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

upvoted a paper about 1 month ago

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

authored a paper about 2 months ago

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

View all activity

Organizations

upvoted 2 papers about 1 month ago

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Paper • 2606.13106 • Published Jun 11 • 22

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Paper • 2606.06428 • Published Jun 4 • 25

upvoted a paper about 2 months ago

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Paper • 2605.18703 • Published May 18 • 50

upvoted a paper 3 months ago

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

Paper • 2604.19572 • Published Apr 21 • 23

upvoted 3 papers 5 months ago

ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall

Paper • 2510.07896 • Published Oct 9, 2025 • 11

On Data Engineering for Scaling LLM Terminal Capabilities

Paper • 2602.21193 • Published Feb 24 • 103

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Paper • 2602.17684 • Published Feb 4 • 22

upvoted a collection 5 months ago

CodeScaler

5 items • Updated Mar 2 • 6

upvoted 2 papers 5 months ago

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Paper • 2602.07422 • Published Feb 7 • 22

MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Paper • 2602.01734 • Published Feb 2 • 34

upvoted a paper 9 months ago

QueST: Incentivizing LLMs to Generate Difficult Problems

Paper • 2510.17715 • Published Oct 20, 2025 • 36

upvoted a paper 10 months ago

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

Paper • 2508.13755 • Published Aug 19, 2025 • 14

upvoted 2 papers 11 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 224

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19, 2025 • 119