Xiangxin Zhou

zhouxiangxin

3 21 4

https://zhouxiangxin1998.github.io/

AI & ML interests

None yet

Recent Activity

authored a paper 25 days ago

Rethinking the Divergence Regularization in LLM RL

authored a paper 25 days ago

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

authored a paper 25 days ago

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

View all activity

Organizations

upvoted a paper 25 days ago

Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Paper • 2605.26108 • Published May 25 • 7

upvoted 3 papers 26 days ago

upvoted a collection 28 days ago

RTDMD

Collection

Reinforcing Few-step Generators via Reward-Tilted Distribution Matching • 5 items • Updated Jun 2 • 3

upvoted a paper 5 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published Feb 4 • 38

upvoted 2 papers 8 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 32

upvoted 4 papers 9 months ago

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 92

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 147

Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts

Paper • 2509.23188 • Published Sep 27, 2025 • 3

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 70

upvoted a collection 9 months ago

Variational Reasoning

Collection

19 items • Updated Sep 28, 2025 • 1

upvoted a paper 10 months ago

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19, 2025 • 119

upvoted a collection about 1 year ago

VeriFree

Collection

2 items • Updated Jun 24, 2025 • 2

upvoted 4 papers about 1 year ago

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28, 2025 • 29

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27, 2025 • 27

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26, 2025 • 24

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19, 2025 • 36

upvoted a paper over 1 year ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26, 2025 • 60

Xiangxin Zhou

AI & ML interests

Recent Activity

Organizations

zhouxiangxin's activity