nyu-visionx/siglip2_decoder
Image-to-Image
•
Updated
•
345
None defined yet.
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding