Update README.md
Browse files
README.md
CHANGED
|
@@ -8,17 +8,15 @@ pipeline_tag: text-to-audio
|
|
| 8 |
tags:
|
| 9 |
- text-to-audio
|
| 10 |
---
|
| 11 |
-
#
|
| 12 |
|
| 13 |
-
**
|
| 14 |
|
| 15 |
-
📣 We are releasing [**Tango-Full-FT-Audiocaps**](https://huggingface.co/declare-lab/tango-full-ft-audiocaps) which was first pre-trained on [**TangoPromptBank**](https://huggingface.co/datasets/declare-lab/TangoPromptBank), a collection of diverse text, audio pairs. We later fine tuned this checkpoint on AudioCaps. This checkpoint obtained state-of-the-art results for text-to-audio generation on AudioCaps.
|
| 16 |
|
| 17 |
## Code
|
| 18 |
|
| 19 |
Our code is released here: [https://github.com/declare-lab/tango](https://github.com/declare-lab/tango)
|
| 20 |
|
| 21 |
-
We uploaded several **TANGO** generated samples here: [https://tango-web.github.io/](https://tango-web.github.io/)
|
| 22 |
|
| 23 |
Please follow the instructions in the repository for installation, usage and experiments.
|
| 24 |
|
|
@@ -63,10 +61,4 @@ prompts = [
|
|
| 63 |
]
|
| 64 |
audios = tango.generate_for_batch(prompts, samples=2)
|
| 65 |
```
|
| 66 |
-
This will generate two samples for each of the three text prompts.
|
| 67 |
-
|
| 68 |
-
## Limitations
|
| 69 |
-
|
| 70 |
-
TANGO is trained on the small AudioCaps dataset so it may not generate good audio samples related to concepts that it has not seen in training (e.g. _singing_). For the same reason, TANGO is not always able to finely control its generations over textual control prompts. For example, the generations from TANGO for prompts _Chopping tomatoes on a wooden table_ and _Chopping potatoes on a metal table_ are very similar. _Chopping vegetables on a table_ also produces similar audio samples. Training text-to-audio generation models on larger datasets is thus required for the model to learn the composition of textual concepts and varied text-audio mappings.
|
| 71 |
-
|
| 72 |
-
We are training another version of TANGO on larger datasets to enhance its generalization, compositional and controllable generation ability.
|
|
|
|
| 8 |
tags:
|
| 9 |
- text-to-audio
|
| 10 |
---
|
| 11 |
+
# Tango 2: Aligning Diffusion-based Text-to-Audio Generative Models through Direct Preference Optimization
|
| 12 |
|
| 13 |
+
🎵 We developed **Tango 2** building upon **Tango** for text-to-audio generation. **Tango 2** was initialized with the **Tango-full-ft** checkpoint and underwent alignment training using DPO on **audio-alpaca**, a dataset of pairwise audio preferences. 🎶
|
| 14 |
|
|
|
|
| 15 |
|
| 16 |
## Code
|
| 17 |
|
| 18 |
Our code is released here: [https://github.com/declare-lab/tango](https://github.com/declare-lab/tango)
|
| 19 |
|
|
|
|
| 20 |
|
| 21 |
Please follow the instructions in the repository for installation, usage and experiments.
|
| 22 |
|
|
|
|
| 61 |
]
|
| 62 |
audios = tango.generate_for_batch(prompts, samples=2)
|
| 63 |
```
|
| 64 |
+
This will generate two samples for each of the three text prompts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|