Xichen Zhang's picture

Xichen Zhang

hkuzxc

·

hkuzxc

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating

upvoted a paper 24 days ago

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

upvoted a paper about 1 month ago

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

View all activity

Organizations

None yet

upvoted a paper 2 days ago

UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating

Paper • 2606.21661 • Published 9 days ago • 24

upvoted a paper 24 days ago

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Paper • 2606.02482 • Published 27 days ago • 36

upvoted 2 papers about 1 month ago

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Paper • 2605.18739 • Published May 18 • 116

MMSkills: Towards Multimodal Skills for General Visual Agents

Paper • 2605.13527 • Published May 14 • 122

upvoted a paper about 2 months ago

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

Paper • 2605.07177 • Published May 8 • 63

upvoted 3 papers 2 months ago

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

Paper • 2604.20796 • Published Apr 22 • 244

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Paper • 2604.07429 • Published Apr 8 • 123

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Paper • 2604.22748 • Published Apr 24 • 231

upvoted a paper 3 months ago

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

Paper • 2603.22003 • Published Mar 23 • 12

upvoted a paper 7 months ago

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

Paper • 2511.11793 • Published Nov 14, 2025 • 196

upvoted a paper 11 months ago

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17, 2025 • 80