1 28 3

Haitao Mi

haitaominlp

https://scholar.google.com.sg/citations?user=G3OMbFSm858C&hl=en

AI & ML interests

Large Language Models

Recent Activity

upvoted a paper about 24 hours ago

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

upvoted a collection 1 day ago

Olmo 3

upvoted a paper about 2 months ago

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

View all activity

Organizations

upvoted a paper about 24 hours ago

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

Paper • 2601.19280 • Published 8 days ago • 9

upvoted a collection 1 day ago

Olmo 3

Collection

Artifacts for the Olmo 3 release. • 9 items • Updated Dec 23, 2025 • 163

upvoted a paper about 2 months ago

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Paper • 2512.15687 • Published Dec 17, 2025 • 20

upvoted 2 papers 3 months ago

The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published Oct 30, 2025 • 117

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published Oct 23, 2025 • 19

upvoted 2 papers 4 months ago

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

Paper • 2510.01444 • Published Oct 1, 2025 • 20

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Paper • 2510.01591 • Published Oct 2, 2025 • 28

upvoted 8 papers 5 months ago

EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

Paper • 2509.12603 • Published Sep 16, 2025 • 9

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 64

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper • 2508.19652 • Published Aug 27, 2025 • 84

authored a paper 5 months ago

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper • 2508.19652 • Published Aug 27, 2025 • 84

upvoted 2 papers 6 months ago

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7, 2025 • 130

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1, 2025 • 94

upvoted 2 papers 7 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14, 2025 • 90

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11, 2025 • 32

Haitao Mi

AI & ML interests

Recent Activity

Organizations

haitaominlp's activity