Generate transcripts from comic images
Image to 3D with DPT + 3D Point Cloud
Generate 3D voxel model from a single photo
Use Meta SAM2 to mask an image and replace
narrator