Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 4 days ago • 50
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published Aug 4 • 133
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published Sep 4, 2024 • 31
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Paper • 2506.07177 • Published Jun 8 • 23
Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge Paper • 2407.03958 • Published Jul 4, 2024 • 21
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Paper • 2408.11813 • Published Aug 21, 2024 • 12
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Paper • 2408.11813 • Published Aug 21, 2024 • 12 • 2
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Paper • 2408.11813 • Published Aug 21, 2024 • 12