Yiming Zhao's picture

Open to Work

Yiming Zhao

gaotiexinqu

·

gaotiexinqu

AI & ML interests

VLMs, Agent, RL, Reasoning

Recent Activity

authored a paper about 24 hours ago

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

authored a paper about 24 hours ago

SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering

upvoted a paper 1 day ago

SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering

View all activity

Organizations

upvoted a paper 1 day ago

SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering

Paper • 2605.17526 • Published 5 days ago • 2

upvoted a paper 2 days ago

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

Paper • 2605.16079 • Published 7 days ago • 24

upvoted a paper 7 days ago

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

Paper • 2605.12480 • Published 10 days ago • 4

upvoted 2 papers 10 days ago

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

Paper • 2605.08043 • Published 14 days ago • 10

Flow-OPD: On-Policy Distillation for Flow Matching Models

Paper • 2605.08063 • Published 14 days ago • 97

upvoted a paper about 1 month ago

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

Paper • 2604.17308 • Published Apr 19 • 22

upvoted 3 papers 4 months ago

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published Jan 29 • 155

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Paper • 2602.02185 • Published Feb 2 • 118

V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction

Paper • 2503.17736 • Published Mar 22, 2025 • 3

upvoted a paper about 1 year ago

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published Apr 10, 2025 • 46

upvoted a collection about 1 year ago

PixMo

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated Mar 2 • 90