| --- |
| license: cc-by-nc-4.0 |
| tags: |
| - audio |
| - rave |
| - timbre-transfer |
| - neural-synthesis |
| - ircam |
| - maestro |
| language: |
| - en |
| pipeline_tag: audio-to-audio |
| --- |
| |
| # RAVE — AEmotionStudio mirror |
|
|
| Curated mirror of public **RAVE** (Realtime Audio Variational autoEncoder) checkpoints, used |
| by MAESTRO's RAVE Timbre Transfer panel (opt-in starter pack). Sources: |
|
|
| - The [Intelligent-Instruments-Lab/rave-models](https://huggingface.co/Intelligent-Instruments-Lab/rave-models) curated set (birds, voices, organs, water, etc.). |
| - The [official ACIDS-IRCAM public catalog](https://acids-ircam.github.io/rave_models_download.html), pulled from the canonical anonymous API at `https://play.forum.ircam.fr/rave-vst-api/get_available_models`. |
|
|
| RAVE was developed by [Antoine Caillon](https://caillonantoine.github.io/) and the |
| [ACIDS team at IRCAM](https://www.ircam.fr/). Paper: [arXiv:2111.05011](https://arxiv.org/abs/2111.05011). |
| Upstream code: [acids-ircam/RAVE](https://github.com/acids-ircam/RAVE). |
|
|
| ## License |
|
|
| **CC-BY-NC-4.0** — non-commercial use only, inherited from the upstream distributions. |
| Generated audio is fine for non-commercial use. Commercial use of the *models themselves* |
| (e.g. shipping them inside a paid product) requires permission from the original authors / IRCAM. |
|
|
| Per MAESTRO's stance (see `LICENSE_AUDIT.md` and the `feedback_download_on_demand_licensing` |
| memory), these weights are fetched *on demand* by the end user — the user (not MAESTRO the |
| binary) is the licensee. |
|
|
| --- |
|
|
| ## Models — IIL-curated set (b2048 streaming exports, 18 models) |
|
|
| Each `.ts` checkpoint has a `<stem>.json` sidecar with name, license, sample-rate, latent-dim, |
| source URL, and a one-line description. |
|
|
| ### Voice / speech |
| - `voice_vocalset_b2048_r48000_z16.ts` — **Voice (VocalSet)**. Voice timbre trained on the VocalSet corpus — covers vocal techniques across multiple singers. Use for the canonical 'make this sound like a voice' transfer. |
| - `voice-multi-b2048-r48000-z11.ts` — **Voice (Multi-speaker)**. Aggregated multi-speaker voice corpus. Wider speaker diversity than VocalSet — produces more 'average human' renders. |
| - `voice_hifitts_b2048_r48000_z16.ts` — **Voice (HiFi-TTS)**. High-fidelity expressive English speech corpus. Cleaner, more articulate than the multi-speaker model. |
| - `voice_jvs_b2048_r44100_z16.ts` — **Voice (JVS, Japanese)**. JVS Japanese multi-speaker corpus at 44.1 kHz. Use for Japanese-language sources or non-Latin phoneme structure. |
| - `voice_vctk_b2048_r44100_z22.ts` — **Voice (VCTK, English)**. VCTK English multi-speaker corpus from CSTR Edinburgh, 44.1 kHz. High 22-dim latent — captures more speaker idiosyncrasies. |
|
|
| ### Bird / wildlife |
| - `birds_motherbird_b2048_r48000_z16.ts` — **Birds (Motherbird)**. Bird-vocalization corpus — chirps + textural transients. The canonical 'weird' pick: produces wildly warped output for any arbitrary input. |
| - `birds_dawnchorus_b2048_r48000_z8.ts` — **Birds (Dawn Chorus)**. Dense overlapping bird vocalizations recorded at dawn. Smaller 8-dim latent — outputs lean ensemble-textural over individual calls. |
| - `birds_pluma_b2048_r48000_z12.ts` — **Birds (Pluma)**. Lighter, individual bird-call timbres. Mid-size 12-dim latent balances character + clarity. |
| - `humpbacks_pondbrain_b2048_r48000_z20.ts` — **Humpback Whales**. Humpback-whale song. Long, slow, hauntingly-deep vocal contours — pairs well with sustained input. |
| - `marinemammals_pondbrain_b2048_r48000_z20.ts` — **Marine Mammals**. Mixed marine-mammal vocalizations — dolphins, orcas, sea-life clicks and cries. |
|
|
| ### Instruments |
| - `guitar_iil_b2048_r48000_z16.ts` — **Guitar (IIL)**. Acoustic / electric guitar timbre. Good demo for transferring voice or synth input into a plucked-string voice. |
| - `organ_bach_b2048_r48000_z16.ts` — **Organ (Bach)**. Pipe-organ timbre trained on Bach repertoire. Sustained harmonic textures — pairs well with melodic input. |
| - `organ_archive_b2048_r48000_z16.ts` — **Organ (Archive)**. Historical pipe-organ recordings — broader, dustier textures than the Bach model. Good for film-score atmospheres. |
| - `sax_soprano_franziskaschroeder_b2048_r48000_z20.ts` — **Soprano Sax (Schroeder)**. Soprano-saxophone extended techniques by Franziska Schroeder. Multiphonics, growls, key clicks. 20-dim latent — captures fine-grained articulation. |
| - `mrp_strengjavera_b2048_r44100_z16.ts` — **Magnetic Resonator Piano (Strengjavera)**. Sustained metallic-string overtones produced by electromagnetically driving piano strings — 44.1 kHz. |
| - `crozzoli_bigensemblesmusic_18d.ts` — **Big Ensemble Music (Crozzoli)**. Big-ensemble orchestral music (M. Crozzoli). Broad 18-dim latent for hugely-textured renders. Sample rate not embedded in filename — defaults to 48 kHz. |
|
|
| ### Textures / environment |
| - `water_pondbrain_b2048_r48000_z16.ts` — **Water (PondBrain)**. Water / aquatic textures. Treats any input as if it were running through liquid — bubbles, ripples, splashes. |
| - `magnets_b2048_r48000_z8.ts` — **Magnets**. Ferromagnetic / electromagnetic resonance textures — metallic hums, distant industrial buzz, magnetized-string ringing. |
|
|
| --- |
|
|
| ## Models — ACIDS public catalog (10 models, mirrored 2026-05-18) |
|
|
| Pulled from the canonical anonymous-download endpoint `https://play.forum.ircam.fr/rave-vst-api/get_model/<slug>`. |
| Each `.ts` has a matching `<slug>.json` sidecar in the same schema as the IIL set. |
|
|
| | Slug | Display name | Type | Author | Year | Size | Prior | |
| |---|---|---|---|---|---|---| |
| | `VCTK` | VCTK (English Speech) | RAVE v1 (default) | Jb Dupuy | 2022 | 177 MB | ✓ | |
| | `darbouka_onnx` | Darbouka (Percussion) | RAVE v2 (ONNX) | Antoine Caillon | 2022 | 26 MB | – | |
| | `nasa` | NASA Apollo 11 | RAVE v1 (default) | Antoine Caillon | 2022 | 159 MB | ✓ | |
| | `percussion` | Percussion (Mixed) | RAVE v1 (default) | Antoine Caillon | 2022 | 71 MB | ✓ | |
| | `vintage` | Vintage Music | RAVE v1 (large) | Antoine Caillon | 2022 | 482 MB | ✓ | |
| | `isis` | ISiS (IRCAM Vocal DB) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – | |
| | `musicnet` | MusicNet (Classical) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 237 MB | ✓ | |
| | `sol_ordinario` | Studio OnLine (Ordinario) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – | |
| | `sol_full` | Studio OnLine (Full) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – | |
| | `sol_ordinario_fast` | Studio OnLine (Ordinario, fast) | RAVE v2 (small) | A. Chemla–Romeu-Santos | 2023 | 43 MB | – | |
|
|
| **ACIDS set total: ~1.6 GB across 10 models.** |
|
|
| > Note: `VCTK.ts` (ACIDS v1, 48 kHz, original 2022 release) and `voice_vctk_b2048_r44100_z22.ts` |
| > (IIL v2 retrain, 44.1 kHz) are *different* models trained on the same source corpus — |
| > keep both for comparison. |
|
|
| --- |
|
|
| ## File format |
|
|
| Each `*.ts` is a [TorchScript](https://pytorch.org/docs/stable/jit.html) export of the RAVE model, |
| streaming-mode (causal convolutions, cached state) — ready for realtime or offline inference. |
|
|
| ```python |
| import torch |
| model = torch.jit.load("vintage.ts") |
| # Encode (B, 1, T) → latents |
| z = model.encode(audio) |
| # Decode latents → audio |
| y = model.decode(z) |
| ``` |
|
|
| Models with "Prior available" additionally ship a learned prior that can generate latents |
| autoregressively (see the [RAVE repo](https://github.com/acids-ircam/RAVE) for usage). |
|
|
| ## Where to find more RAVE models |
|
|
| - [Neutone FX models](https://neutone.ai/fx/models) — community + curated `.nm` files (the Neutone wrapper format). |
| - [IRCAM Forum projects](https://forum.ircam.fr/) — individual user-submitted models; many require Forum account. |
| - [acids-ircam GitHub releases](https://github.com/acids-ircam/RAVE/releases) — reference checkpoints from the maintainers. |
| - [IRCAM RAVE Model Challenge 2025](https://forum.ircam.fr/collections/detail/rave-model-challenge-models/) — 11 prize-winner / submission models gated behind a Forum account. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{caillon2021rave, |
| title={RAVE: A variational autoencoder for fast and high-quality neural audio synthesis}, |
| author={Caillon, Antoine and Esling, Philippe}, |
| booktitle={arXiv preprint arXiv:2111.05011}, |
| year={2021} |
| } |
| ``` |
|
|