Spaces:

ZeqiangLai
/

Anything2Image

Runtime error

App Files Files Community

Anything2Image / README.md

laizeqiang

First model version

c43b0d6 over 2 years ago

preview code

raw

history blame contribute delete

2.45 kB

	---
	title: Anything2Image
	emoji: 🏃
	colorFrom: gray
	colorTo: blue
	sdk: gradio
	sdk_version: 3.29.0
	app_file: app.py
	pinned: false
	---

	# Anything To Image

	Generate image from anything with [ImageBind](https://github.com/facebookresearch/ImageBind)'s unified latent space and [stable-diffusion-2-1-unclip](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip).

	- No training is need.
	- Integration with 🤗 [Diffusers](https://github.com/huggingface/diffusers).
	- `imagebind` is directly copy from [official repo](https://github.com/facebookresearch/ImageBind) with modification.
	- Gradio Demo.

	## Audio to Image

	\| `assets/wav/bird_audio.wav` \| `assets/wav/dog_audio.wav` \| `assets/wav/cattle.wav`
	\| --- \| --- \| --- \|
	\| ![](assets/generated/bird_audio.png) \| ![](assets/generated/dog_audio.png) \|![](assets/generated/cattle.png) \|

	```python
	import imagebind
	import torch
	from diffusers import StableUnCLIPImg2ImgPipeline

	# construct models
	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
	"stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16"
	)
	pipe = pipe.to(device)

	model = imagebind.imagebind_huge(pretrained=True)
	model.eval()
	model.to(device)

	# generate image
	with torch.no_grad():
	audio_paths=["assets/wav/bird_audio.wav"]
	embeddings = model.forward({
	imagebind.ModalityType.AUDIO: imagebind.load_and_transform_audio_data(audio_paths, device),
	})
	embeddings = embeddings[imagebind.ModalityType.AUDIO]
	images = pipe(image_embeds=embeddings.half()).images
	images[0].save("bird_audio.png")
	```

	## More

	Under construction


	## Citation

	Latent Diffusion

	```bibtex
	@InProceedings{Rombach_2022_CVPR,
	author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
	title = {High-Resolution Image Synthesis With Latent Diffusion Models},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month = {June},
	year = {2022},
	pages = {10684-10695}
	}
	```

	ImageBind
	```bibtex
	@inproceedings{girdhar2023imagebind,
	title={ImageBind: One Embedding Space To Bind Them All},
	author={Girdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang
	and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
	booktitle={CVPR},
	year={2023}
	}
	```