Spatial-SSRL Spatial Reasoning
Spatial reasoning with vision-language models
FitDiT is a high-fidelity virtual try-on model.
Easily expand image boundaries
Upgraded to v1.0!
Add a logo to anything
Audio Conditioned LipSync with Latent Diffusion Models
Colorize grayscale images using automatic captions
Generate full app code from your idea
Generate new person images with swapped clothes or poses
Convert images of screens to structured elements
Fill and modify images using a mask and prompt