Jianzong Wu's picture

Jianzong Wu

jianzongwu

·

https://jianzongwu.github.io

jianzongwu

AI & ML interests

Multimodal Learning

Recent Activity

upvoted a paper 2 days ago

Towards Customized Multimodal Role-Play

upvoted a paper about 1 month ago

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

upvoted a paper about 1 month ago

Seedance 2.0: Advancing Video Generation for World Complexity

View all activity

Organizations

None yet

upvoted a paper 2 days ago

Towards Customized Multimodal Role-Play

Paper • 2605.08129 • Published 28 days ago • 8

upvoted 2 papers about 1 month ago

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Paper • 2604.24763 • Published Apr 27 • 71

Seedance 2.0: Advancing Video Generation for World Complexity

Paper • 2604.14148 • Published Apr 15 • 163

upvoted 2 papers 3 months ago

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Paper • 2603.02175 • Published Mar 2 • 24

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Paper • 2602.24233 • Published Feb 27 • 60

upvoted 5 papers 4 months ago

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

Paper • 2601.21406 • Published Jan 29 • 6

Advancing Open-source World Models

Paper • 2601.20540 • Published Jan 28 • 135

Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs

Paper • 2601.17058 • Published Jan 22 • 190

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published Jan 22 • 44

upvoted a paper 5 months ago

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

Paper • 2512.24551 • Published Dec 31, 2025 • 21

upvoted 3 papers 6 months ago

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

Paper • 2512.05112 • Published Dec 4, 2025 • 13

Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

Paper • 2512.02457 • Published Dec 2, 2025 • 14

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published Nov 12, 2025 • 72

upvoted 2 papers 7 months ago

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 56

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21, 2025 • 37

upvoted 2 papers 8 months ago

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Paper • 2510.11712 • Published Oct 13, 2025 • 31

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 189

upvoted 2 papers 10 months ago

Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 276

Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1, 2025 • 63