pangpangxuan

pangxuan

179 3

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning

upvoted a paper 2 days ago

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

upvoted a paper 10 days ago

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

View all activity

Organizations

None yet

upvoted a paper 1 day ago

Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning

Paper • 2607.18722 • Published 2 days ago • 30

upvoted a paper 2 days ago

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

Paper • 2606.29538 • Published 7 days ago • 137

upvoted a paper 10 days ago

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

Paper • 2607.08964 • Published 14 days ago • 74

upvoted a paper 21 days ago

DOPD: Dual On-policy Distillation

Paper • 2606.30626 • Published 24 days ago • 112

upvoted 2 papers 24 days ago

Qwen-Image-2.0-RL Technical Report

Paper • 2606.27608 • Published 28 days ago • 52

Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots

Paper • 2606.28133 • Published 27 days ago • 40

upvoted 3 papers 25 days ago

LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published Jan 23 • 183

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Paper • 2509.26490 • Published Sep 30, 2025 • 22

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

Paper • 2606.24530 • Published about 1 month ago • 64

upvoted a paper 26 days ago

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Paper • 2606.26790 • Published 28 days ago • 56

upvoted a paper 27 days ago

DanceOPD: On-Policy Generative Field Distillation

Paper • 2606.27377 • Published 28 days ago • 81

upvoted a paper 28 days ago

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

Paper • 2606.25757 • Published 29 days ago • 1

upvoted a paper 29 days ago

Qwen-AgentWorld: Language World Models for General Agents

Paper • 2606.24597 • Published about 1 month ago • 151

upvoted 7 papers about 2 months ago

GrepSeek: Training Search Agents for Direct Corpus Interaction

Paper • 2605.29307 • Published May 28 • 117

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Paper • 2606.02437 • Published Jun 1 • 240

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Paper • 2605.31264 • Published May 29 • 124

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Paper • 2605.31584 • Published May 29 • 43

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Paper • 2605.28424 • Published May 27 • 32

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Paper • 2605.29801 • Published May 28 • 149

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Paper • 2605.25874 • Published May 25 • 106

pangpangxuan

AI & ML interests

Recent Activity

Organizations

pangxuan's activity