Yufan Zhang
zyf515730395
AI & ML interests
None yet
Recent Activity
updated
a collection
about 16 hours ago
World Models
updated
a collection
about 16 hours ago
Image Generation
updated
a collection
about 16 hours ago
MLLM/LLM
Organizations
None yet
Tools
MLLM/LLM
-
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 78 -
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
MiMo-VL Technical Report
Paper • 2506.03569 • Published • 80 -
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Paper • 2506.03147 • Published • 58
Image Generation
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
World Models
AR Generation
3D Gen&Recon
-
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
Paper • 2506.05573 • Published • 82 -
Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data
Paper • 2506.04120 • Published • 7 -
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS
Paper • 2506.02751 • Published • 4 -
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Paper • 2505.07747 • Published • 61
Video Generation
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 105 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
RL
World Models
Tools
AR Generation
MLLM/LLM
-
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 78 -
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
MiMo-VL Technical Report
Paper • 2506.03569 • Published • 80 -
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Paper • 2506.03147 • Published • 58
3D Gen&Recon
-
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
Paper • 2506.05573 • Published • 82 -
Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data
Paper • 2506.04120 • Published • 7 -
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS
Paper • 2506.02751 • Published • 4 -
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Paper • 2505.07747 • Published • 61
Image Generation
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
Video Generation
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 105 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4