auffusion
/

auffusion-full

Model card Files Files and versions

auffusion commited on Jan 1, 2024

Commit

6b038b2

·

1 Parent(s): da9b5ff

first commit

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -10,7 +10,9 @@ tags:
 **Auffusion** is a latent diffusion model (LDM) for text-to-audio (TTA) generation. **Auffusion** can generate realistic audios including human sounds, animal sounds, natural and artificial sounds and sound effects from textual prompts. We introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment. Our objective and subjective evaluations demonstrate that Auffusion surpasses previous TTA approaches using limited data and computational resource. We release our model, inference code, and pre-trained checkpoints for the research community.
 📣 We are releasing **Auffusion-Full-no-adapter** which was pre-trained on all datasets described in paper and created for easy use of audio manipulation.
 📣 We are releasing **Auffusion-Full** which was pre-trained on all datasets described in paper.
 📣 We are releasing **Auffusion** which was pre-trained on **AudioCaps**.
 ## Auffusion Model Family

 **Auffusion** is a latent diffusion model (LDM) for text-to-audio (TTA) generation. **Auffusion** can generate realistic audios including human sounds, animal sounds, natural and artificial sounds and sound effects from textual prompts. We introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment. Our objective and subjective evaluations demonstrate that Auffusion surpasses previous TTA approaches using limited data and computational resource. We release our model, inference code, and pre-trained checkpoints for the research community.
 📣 We are releasing **Auffusion-Full-no-adapter** which was pre-trained on all datasets described in paper and created for easy use of audio manipulation.
 📣 We are releasing **Auffusion-Full** which was pre-trained on all datasets described in paper.
 📣 We are releasing **Auffusion** which was pre-trained on **AudioCaps**.
 ## Auffusion Model Family