Hudson Mitchell's picture

Hudson Mitchell

wangyi21

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 19 hours ago

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

liked a dataset 6 days ago

upvoted a paper 7 days ago

π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

View all activity

Organizations

None yet

upvoted a paper about 19 hours ago

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

Paper • 2605.28721 • Published 2 days ago • 11

upvoted a paper 7 days ago

π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

Paper • 2605.14678 • Published 10 days ago • 102

upvoted a paper 15 days ago

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

Paper • 2605.12070 • Published 17 days ago • 16

upvoted a paper 22 days ago

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Paper • 2604.28196 • Published 29 days ago • 72

upvoted a paper 28 days ago

Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

Paper • 2604.28158 • Published 29 days ago • 49

upvoted 5 papers about 2 months ago

Do Audio-Visual Large Language Models Really See and Hear?

Paper • 2604.02605 • Published Apr 3 • 7

Adam's Law: Textual Frequency Law on Large Language Models

Paper • 2604.02176 • Published Apr 2 • 504

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

Paper • 2603.26164 • Published Mar 27 • 365

ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

Paper • 2604.03922 • Published Apr 5 • 53

ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

Paper • 2603.24414 • Published Mar 25 • 183

upvoted a paper 2 months ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 372

upvoted 5 papers 3 months ago

Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 196

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Paper • 2602.22859 • Published Feb 26 • 150

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 221

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 524

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Paper • 2602.08354 • Published Feb 9 • 266