Representation Forcing for Bottleneck-Free Unified Multimodal Models Paper • 2605.31604 • Published 28 days ago • 61
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria Paper • 2605.08354 • Published May 8 • 23
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models Paper • 2604.08545 • Published Apr 9 • 41
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published Mar 30 • 58
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper • 2603.19232 • Published Mar 19 • 33
BitDance Collection BitDance: Open-source autoregressive model with binary visual tokens. A research project for building powerful multimodal autoregressive model. • 10 items • Updated Mar 2 • 11
UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model Paper • 2602.14178 • Published Feb 15 • 15
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published Mar 13 • 55
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published Dec 19, 2025 • 37
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published Dec 15, 2025 • 65
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 35
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 247
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation Paper • 2511.20256 • Published Nov 25, 2025 • 28
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published Oct 9, 2025 • 65
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark Paper • 2509.09680 • Published Sep 11, 2025 • 44
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing Paper • 2509.01984 • Published Sep 2, 2025 • 7