Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14, 2024 • 57
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis Paper • 2411.19509 • Published Nov 29, 2024 • 3
Versatile Multimodal Controls for Whole-Body Talking Human Animation Paper • 2503.08714 • Published Mar 10, 2025 • 1
EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation Paper • 2507.03905 • Published Jul 5, 2025 • 1
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs Paper • 2508.10576 • Published Aug 14, 2025 • 8
DC-Former: Diverse and Compact Transformer for Person Re-Identification Paper • 2302.14335 • Published Feb 28, 2023
StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models Paper • 2409.02543 • Published Sep 4, 2024
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Paper • 2510.24821 • Published Oct 28, 2025 • 41
The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning Paper • 2603.04415 • Published Feb 4 • 3
The Thinking Boundary: Quantifying Reasoning Suitability of Multimodal Tasks via Dual Tuning Paper • 2603.04415 • Published Feb 4 • 3
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs Paper • 2508.10576 • Published Aug 14, 2025 • 8
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs Paper • 2508.10576 • Published Aug 14, 2025 • 8 • 2
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning Paper • 2507.08306 • Published Jul 11, 2025 • 1
Versatile Multimodal Controls for Whole-Body Talking Human Animation Paper • 2503.08714 • Published Mar 10, 2025 • 1
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis Paper • 2411.19509 • Published Nov 29, 2024 • 3