🔄 In a Training Loop

1 59 148

Peng Wang

stillarrow

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

liked a model 2 days ago

zai-org/GLM-5.2

liked a model 2 days ago

Qwen/Qwen3.5-0.8B

liked a dataset 18 days ago

open-thoughts/OpenThoughts-Agent-v1-SFT

View all activity

Organizations

None yet

upvoted 3 papers about 1 month ago

TradingAgents: Multi-Agents LLM Financial Trading Framework

Paper • 2412.20138 • Published Dec 28, 2024 • 102

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Paper • 2605.21468 • Published May 20 • 51

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Paper • 2605.14212 • Published May 14 • 18

upvoted 2 papers about 2 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 233

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Paper • 2602.10090 • Published Feb 10 • 53

upvoted 2 papers 2 months ago

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Paper • 2604.14268 • Published Apr 15 • 126

Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 198

upvoted a paper 3 months ago

Self-Distilled RLVR

Paper • 2604.03128 • Published Apr 3 • 179

upvoted a collection 3 months ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 38 items • Updated Mar 2 • 372

upvoted 2 papers 3 months ago

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 229

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published Mar 16 • 187

upvoted a collection 3 months ago

NeMo Gym

Collection

Collection of RL verifiable data for NeMo Gym • 32 items • Updated 13 days ago • 62

upvoted a collection 4 months ago

BFS-Prover

Collection

LLM Step-Provers in Lean4 • 5 items • Updated Oct 7, 2025 • 8

upvoted 3 papers 4 months ago

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Paper • 2502.16707 • Published Feb 23, 2025 • 14

Learning to Repair Lean Proofs from Compiler Feedback

Paper • 2602.02990 • Published Feb 3 • 29

Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 76

upvoted a paper 5 months ago

Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 105

upvoted an article 5 months ago

Article

Open Responses: What you need to know

evalstate, burtenshaw, merve, pcuenq

•

Jan 15

• 112

upvoted 2 papers 5 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 110

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

Peng Wang

AI & ML interests

Recent Activity

Organizations

stillarrow's activity

Open Responses: What you need to know