Zekun Qi

qizekun

10 37 59

https://qizekun.github.io/

qizekun

AI & ML interests

Embodied Intelligence, Large Langugae Model, 3D Computer Vision

Recent Activity

upvoted a paper 2 days ago

PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception

authored a paper 11 days ago

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

upvoted a paper 15 days ago

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

View all activity

Organizations

upvoted a paper 2 days ago

PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception

Paper • 2606.28322 • Published 8 days ago • 36

upvoted a paper 15 days ago

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

Paper • 2606.19531 • Published 17 days ago • 22

upvoted a paper 26 days ago

LIMMT: Less is More for Motion Tracking

Paper • 2606.06953 • Published 29 days ago • 16

upvoted 2 papers about 1 month ago

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Paper • 2606.03985 • Published Jun 2 • 41

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published May 28 • 146

upvoted a paper 4 months ago

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Paper • 2602.16705 • Published Feb 18 • 26

upvoted 2 papers 5 months ago

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Paper • 2602.10098 • Published Feb 10 • 22

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published Feb 3 • 65

upvoted a collection 5 months ago

OmniSpatial

Collection

Collections of ICLR 2026 paper: "OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models" • 4 items • Updated Jan 27 • 1

upvoted a paper 6 months ago

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published Jan 14 • 196

upvoted a collection 9 months ago

GS-Reasoner

Collection

Collections of paper "Reasoning in Space via Grounding in the World" • 6 items • Updated Oct 20, 2025 • 2

upvoted a paper 9 months ago

Reasoning in Space via Grounding in the World

Paper • 2510.13800 • Published Oct 15, 2025 • 15

upvoted 2 papers 10 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 224

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Paper • 2508.08240 • Published Aug 11, 2025 • 45

upvoted a paper 11 months ago

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

Paper • 2508.10711 • Published Aug 14, 2025 • 146

upvoted 3 papers 12 months ago

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Paper • 2507.13344 • Published Jul 17, 2025 • 59

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7, 2025 • 75

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published Jul 6, 2025 • 45

upvoted 2 papers about 1 year ago

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Paper • 2506.03135 • Published Jun 3, 2025 • 40

ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding

Paper • 2506.01853 • Published Jun 2, 2025 • 32

Zekun Qi

AI & ML interests

Recent Activity

Organizations

qizekun's activity