|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
--- |
|
|
|
|
|
# RAVE Models |
|
|
|
|
|
This is a collection of [RAVE](https://github.com/acids-ircam/RAVE) models trained by the [Intelligent Instruments Lab](https://iil.is) for various projects. |
|
|
|
|
|
Most of these models are encoder-decoder only, no prior, and all use the `--causal` mode and are exported for streaming inference with [nn~](https://github.com/acids-ircam/nn_tilde), [NN.ar](https://github.com/elgiano/nn.ar) or [rave-supercollider](https://github.com/victor-shepardson/rave-supercollider). |
|
|
|
|
|
### guitar_iil_b2048_r48000_z16.ts |
|
|
|
|
|
Dataset: [IILGuitarTimbre](https://github.com/Intelligent-Instruments-Lab/IILGuitarTimbre). |
|
|
|
|
|
Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions. |
|
|
|
|
|
### organ_archive_b2048_r48000_z16.ts |
|
|
|
|
|
Dataset: public domain organ music from archive.org. Small amounts of voice and other instruments were included, and vinyl record noises are prominent. |
|
|
|
|
|
Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions. |
|
|
|
|
|
### organ_bach_b2048_sr48000_z16.ts |
|
|
|
|
|
Dataset: various recordings of J. S. Bach music for church organ. |
|
|
|
|
|
Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions. |
|
|
|
|
|
### voice_vocalset_b2048_r48000_z16.ts |
|
|
|
|
|
Dataset: [VocalSet](https://zenodo.org/record/1193957) singing voice dataset. |
|
|
|
|
|
Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions. |
|
|
|
|
|
### voice_hifitts_b2048_r48000_z16.ts |
|
|
|
|
|
Dataset: [Hi-Fi TTS](http://arxiv.org/abs/2104.01497) audiobooks dataset. |
|
|
|
|
|
Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions. |
|
|
|
|
|
### voice_jvs_b2048_r44100_z16.ts |
|
|
|
|
|
Dataset: [Hi-Fi TTS](http://arxiv.org/abs/2104.01497) speaker 9017 (John Van Stan). |
|
|
|
|
|
Model: RAVE v3, 44.1kHz, block size 2048, 16 latent dimensions. |
|
|
|
|
|
### voice_vctk_b2048_r44100_z16.ts |
|
|
|
|
|
Dataset: [CSTR VCTK Corpus](https://datashare.ed.ac.uk/handle/10283/3443) multispeaker read speech dataset. |
|
|
|
|
|
Model: RAVE v3, 44.1kHz, block size 2048, 22 latent dimensions. |
|
|
|
|
|
|
|
|
|
|
|
|