Scale RAE Collection Collection for "Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders" • 6 items • Updated 4 days ago • 2
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models Paper • 2510.02880 • Published Oct 3, 2025 • 2
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models Paper • 2510.02880 • Published Oct 3, 2025 • 2 • 2
ReDDiT: Rehashing Noise for Discrete Visual Generation Paper • 2505.19656 • Published May 26, 2025 • 1
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models Paper • 2510.02880 • Published Oct 3, 2025 • 2
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension Paper • 2406.11327 • Published Jun 17, 2024
ChatterBox: Multi-round Multimodal Referring and Grounding Paper • 2401.13307 • Published Jan 24, 2024
Artemis: Towards Referential Understanding in Complex Videos Paper • 2406.00258 • Published Jun 1, 2024
ReDDiT: Rehashing Noise for Discrete Visual Generation Paper • 2505.19656 • Published May 26, 2025 • 1
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation Paper • 2409.04410 • Published Sep 6, 2024 • 25