Wentian Zhao's picture

13

Wentian Zhao

zwt123home123

·

zhaowt615@gmail.com

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

updated a model 3 months ago

self-play/qwen3-8b-solver-v5

published a model 3 months ago

self-play/qwen3-8b-solver-v5

View all activity

Organizations

upvoted a paper 1 day ago

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

Paper • 2605.24202 • Published 12 days ago • 13

upvoted a paper 3 months ago

MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

Paper • 2603.09206 • Published Mar 10 • 53

upvoted 2 papers 4 months ago

What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis

Paper • 2602.12395 • Published Feb 12 • 17

Quantifying the Gap between Understanding and Generation within Unified Multimodal Models

Paper • 2602.02140 • Published Feb 2 • 12

upvoted 2 papers 5 months ago

Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Paper • 2512.19995 • Published Dec 23, 2025 • 16

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Paper • 2512.18880 • Published Dec 21, 2025 • 25

upvoted a paper 7 months ago

Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs

Paper • 2511.07419 • Published Nov 10, 2025 • 27

upvoted 2 papers 8 months ago

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Paper • 2509.25541 • Published Sep 29, 2025 • 142

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137

upvoted 2 papers 11 months ago

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Paper • 2507.07996 • Published Jul 10, 2025 • 35

Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning

Paper • 2506.09501 • Published Jun 11, 2025 • 20

upvoted a paper about 1 year ago

DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training

Paper • 2504.09710 • Published Apr 13, 2025 • 19

upvoted a paper over 1 year ago

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

Paper • 2411.14343 • Published Nov 21, 2024 • 7