nyu-visionx/Scale-RAE-Qwen1.5B_DiT2.4B-64ep
Text Generation • 4B • Updated
• 14
None defined yet.
Solaris: Building a Multiplayer Video World Model in Minecraft
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders