2 21 7

Yanwei Li

YanweiLi

AI & ML interests

None yet

Recent Activity

upvoted a paper 18 days ago

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

upvoted a paper 24 days ago

Semantic Generative Tuning for Unified Multimodal Models

upvoted a paper about 2 months ago

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

View all activity

Organizations

None yet

upvoted a paper 18 days ago

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Paper • 2606.07433 • Published 21 days ago • 21

upvoted a paper 24 days ago

Semantic Generative Tuning for Unified Multimodal Models

Paper • 2605.18714 • Published May 18 • 11

upvoted a paper about 2 months ago

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Paper • 2604.22748 • Published Apr 24 • 231

upvoted 2 papers 3 months ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 237

Efficient Reasoning with Balanced Thinking

Paper • 2603.12372 • Published Mar 12 • 151

upvoted 2 papers 4 months ago

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Paper • 2602.24286 • Published Feb 27 • 99

Utonia: Toward One Encoder for All Point Clouds

Paper • 2603.03283 • Published Mar 3 • 186

upvoted a paper 7 months ago

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134

upvoted a collection 7 months ago

VST

Collection

A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities. • 7 items • Updated 10 days ago • 6

upvoted a paper 7 months ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 218

upvoted 5 papers 8 months ago

How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

Paper • 2509.18905 • Published Sep 23, 2025 • 31

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

Paper • 2510.21817 • Published Oct 21, 2025 • 41

upvoted a paper about 1 year ago

Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11, 2025 • 157

upvoted a paper over 1 year ago

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published Dec 12, 2024 • 48

upvoted a paper almost 2 years ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61

upvoted a paper about 2 years ago

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 49

upvoted a collection about 2 years ago

MGM-Data

Collection

Official data collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 2 items • Updated Apr 21, 2024 • 7

Yanwei Li

AI & ML interests

Recent Activity

Organizations

YanweiLi's activity