UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer Paper • 2606.16255 • Published 12 days ago • 14
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers Paper • 2606.13289 • Published 16 days ago • 29
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers Paper • 2606.13289 • Published 16 days ago • 29
Representation Forcing for Bottleneck-Free Unified Multimodal Models Paper • 2605.31604 • Published 29 days ago • 61
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper • 2603.19232 • Published Mar 19 • 33
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy Paper • 2511.21579 • Published Nov 26, 2025 • 23
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 49
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 49
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 49 • 6
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation Paper • 2511.19365 • Published Nov 24, 2025 • 66
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation Paper • 2303.00440 • Published Mar 1, 2023
DPL: Decoupled Prompt Learning for Vision-Language Models Paper • 2308.10061 • Published Aug 19, 2023 • 1
MGMAE: Motion Guided Masking for Video Masked Autoencoding Paper • 2308.10794 • Published Aug 21, 2023
StableDrag: Stable Dragging for Point-based Image Editing Paper • 2403.04437 • Published Mar 7, 2024 • 27
VFIMamba: Video Frame Interpolation with State Space Models Paper • 2407.02315 • Published Jul 2, 2024