xiang huang

xianghuang

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

upvoted a paper about 2 months ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

upvoted a paper about 2 months ago

Recursive Multi-Agent Systems

View all activity

Organizations

None yet

upvoted 6 papers about 2 months ago

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Paper • 2601.11868 • Published Jan 17 • 37

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published Mar 20 • 353

Recursive Multi-Agent Systems

Paper • 2604.25917 • Published Apr 28 • 287

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published Apr 9 • 295

Seedance 2.0: Advancing Video Generation for World Complexity

Paper • 2604.14148 • Published Apr 15 • 168

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Paper • 2604.06132 • Published Apr 7 • 122

upvoted 3 collections 2 months ago

🍷 FineWeb

7 items • Updated Jun 20, 2025 • 34

Nemotron v3 Pre-Training

Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 24 days ago • 17

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 15 items • Updated 24 days ago • 176

upvoted a paper 2 months ago

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 113

upvoted 3 papers 3 months ago

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Paper • 2602.08234 • Published Feb 9 • 76

Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 76

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 196

upvoted a paper 4 months ago

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published Mar 10 • 158

upvoted a paper 7 months ago

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Paper • 2507.03112 • Published Jul 3, 2025 • 34

upvoted a paper 9 months ago

ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents

Paper • 2505.23923 • Published May 29, 2025 • 8

upvoted 4 papers 10 months ago

Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

Paper • 2505.17813 • Published May 23, 2025 • 58

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Paper • 2505.11896 • Published May 17, 2025 • 58

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30, 2025 • 60

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Paper • 2505.14810 • Published May 20, 2025 • 63