Open to Collab

40 80 42

Qinghong (Kevin) Lin

KevinQHLin

http://qhlin.me/

AI & ML interests

Vision-Language Model, Video Understanding, Agent

Recent Activity

upvoted a paper 1 day ago

MolmoAct2: Action Reasoning Models for Real-world Deployment

authored a paper 3 days ago

Egocentric Video-Language Pretraining

liked a model 5 days ago

GD-ML/Code2World

View all activity

Organizations

authored a paper 3 days ago

Egocentric Video-Language Pretraining

Paper • 2206.01670 • Published Jun 3, 2022

authored a paper 9 days ago

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Paper • 2604.22748 • Published 13 days ago • 224

authored a paper 22 days ago

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Paper • 2604.07429 • Published 29 days ago • 119

authored a paper about 1 month ago

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Paper • 2603.24440 • Published Mar 25 • 98

authored 2 papers 3 months ago

Learning Video Context as Interleaved Multimodal Sequences

Paper • 2407.21757 • Published Jul 31, 2024

Code2World: A GUI World Model via Renderable Code Generation

Paper • 2602.09856 • Published Feb 10 • 201

authored 2 papers 4 months ago

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

Paper • 2601.03928 • Published Jan 7 • 16

ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands

Paper • 2512.24965 • Published Dec 31, 2025 • 43

submitted a paper to Daily Papers 4 months ago

ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

Paper • 2512.24965 • Published Dec 31, 2025 • 43

authored a paper 5 months ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published Dec 15, 2025 • 65

submitted a paper to Daily Papers 5 months ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published Dec 15, 2025 • 65

authored 3 papers 5 months ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Paper • 2503.15661 • Published Mar 19, 2025 • 3

Computer-Use Agents as Judges for Generative User Interface

Paper • 2511.15567 • Published Nov 19, 2025 • 54

authored 2 papers 6 months ago

Grounding Computer Use Agents on Human Demonstrations

Paper • 2511.07332 • Published Nov 10, 2025 • 107

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4, 2025 • 103

authored 2 papers 7 months ago

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 120

Code2Video: A Code-centric Paradigm for Educational Video Generation

Paper • 2510.01174 • Published Oct 1, 2025 • 35

authored a paper 9 months ago

Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11, 2025 • 30

authored a paper 11 months ago

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27, 2025 • 109

Qinghong (Kevin) Lin

AI & ML interests

Recent Activity

Organizations

KevinQHLin's activity