3D/4D Scenes from a Single Image w/ Controllable Video Diff
Generate videos from text prompts or images