cvssp
/

audioldm

sanchit-gandhi commited on Apr 26, 2023

Commit

4c64d38

1 Parent(s): 05d0fab

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -15,6 +15,19 @@ is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio r
 latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
 sound effects, human speech and music.
 ## Model Sources
 - [**Original Repository**](https://github.com/haoheliu/AudioLDM)

 latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
 sound effects, human speech and music.
+# Checkpoint Details
+This is the original, **small** version of the AudioLDM model, also referred to as **audioldm-s-full**. The four AudioLDM checkpoints are summarised in the table below:
+**Table 1:** Summary of the AudioLDM checkpoints.
+| Checkpoint                                                            | Training Data (h) | Training Steps | Params |
+|-----------------------------------------------------------------------|-------------------|----------------|--------|
+| [audioldm-s-full](https://huggingface.co/cvssp/audioldm)              | 9174              | 1.5M           | 421M   |
+| [audioldm-s-full-v2](https://huggingface.co/cvssp/audioldm-s-full-v2) | 9174              | > 1.5M         | 421M   |
+| [audioldm-m-full](https://huggingface.co/cvssp/audioldm-m-full)       | 9174              | 1.5M           | 652M   |
+| [audioldm-l-full](https://huggingface.co/cvssp/audioldm-l-full)       | 9174              | 1.5M           | 975M   |
 ## Model Sources
 - [**Original Repository**](https://github.com/haoheliu/AudioLDM)