Intelligent-Instruments-Lab
/

rave-models

Model card Files Files and versions

Metrics Training metrics Community

rave-models / README.md

no-op-ul-se's picture

readme

f8efefb about 2 years ago

|

1.93 kB

metadata

license: cc-by-nc-4.0

RAVE Models

This is a collection of RAVE models trained by the Intelligent Instruments Lab for various projects.

Most of these models are encoder-decoder only, no prior, and all use the --causal mode and are exported for streaming inference with nn~, NN.ar or rave-supercollider.

guitar_iil_b2048_r48000_z16.ts

Dataset: IILGuitarTimbre.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

organ_archive_b2048_r48000_z16.ts

Dataset: public domain organ music from archive.org. Small amounts of voice and other instruments were included, and vinyl record noises are prominent.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

organ_bach_b2048_sr48000_z16.ts

Dataset: various recordings of J. S. Bach music for church organ.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

voice_vocalset_b2048_r48000_z16.ts

Dataset: VocalSet singing voice dataset.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

voice_hifitts_b2048_r48000_z16.ts

Dataset: Hi-Fi TTS audiobooks dataset.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

voice_jvs_b2048_r44100_z16.ts

Dataset: Hi-Fi TTS speaker 9017 (John Van Stan).

Model: RAVE v3, 44.1kHz, block size 2048, 16 latent dimensions.

voice_vctk_b2048_r44100_z16.ts

Dataset: CSTR VCTK Corpus multispeaker read speech dataset.

Model: RAVE v3, 44.1kHz, block size 2048, 22 latent dimensions.