zuijiang

zuijiang

·

AI & ML interests

None yet

Recent Activity

updated a dataset 18 days ago

zuijiang/perfectblend

published a dataset 21 days ago

zuijiang/perfectblend

upvoted a paper about 1 month ago

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

View all activity

Organizations

upvoted 2 papers about 1 month ago

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Paper • 2606.12370 • Published Jun 10 • 21

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Paper • 2605.31058 • Published May 29 • 2

upvoted 4 papers about 2 months ago

dots.tts Technical Report

Paper • 2606.07080 • Published Jun 5 • 16

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Paper • 2605.29707 • Published May 28 • 152

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

Paper • 2605.29559 • Published May 28 • 17

MetaphorVU: Towards Metaphorical Video Understanding

Paper • 2605.25461 • Published May 25 • 8

upvoted 5 papers 2 months ago

MolmoAct2: Action Reasoning Models for Real-world Deployment

Paper • 2605.02881 • Published May 4 • 356

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

Paper • 2605.19577 • Published May 19 • 59

Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding

Paper • 2605.20104 • Published May 19 • 7

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Paper • 2605.13301 • Published May 13 • 166

RigidFormer: Learning Rigid Dynamics using Transformers

Paper • 2605.09196 • Published May 9 • 14

upvoted a paper 3 months ago

Large Language Models Explore by Latent Distilling

Paper • 2604.24927 • Published Apr 27 • 74

upvoted an article 3 months ago

Article

Releasing LiteCoder-Terminal-SFT

Lite-Coder

•

Apr 13

• 4

upvoted 6 papers 4 months ago

RAGEN-2: Reasoning Collapse in Agentic RL

Paper • 2604.06268 • Published Apr 7 • 69

Composer 2 Technical Report

Paper • 2603.24477 • Published Mar 25 • 20

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Paper • 2603.24472 • Published Mar 25 • 57

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Paper • 2603.19220 • Published Mar 19 • 70

Complementary Reinforcement Learning

Paper • 2603.17621 • Published Mar 18 • 37

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published Mar 10 • 158

upvoted a paper 5 months ago

veScale-FSDP: Flexible and High-Performance FSDP at Scale

Paper • 2602.22437 • Published Feb 25 • 7