--- license: cc-by-nc-4.0 tags: - audio - rave - timbre-transfer - neural-synthesis - ircam - maestro language: - en pipeline_tag: audio-to-audio --- # RAVE — AEmotionStudio mirror Curated mirror of public **RAVE** (Realtime Audio Variational autoEncoder) checkpoints, used by MAESTRO's RAVE Timbre Transfer panel (opt-in starter pack). Sources: - The [Intelligent-Instruments-Lab/rave-models](https://huggingface.co/Intelligent-Instruments-Lab/rave-models) curated set (birds, voices, organs, water, etc.). - The [official ACIDS-IRCAM public catalog](https://acids-ircam.github.io/rave_models_download.html), pulled from the canonical anonymous API at `https://play.forum.ircam.fr/rave-vst-api/get_available_models`. RAVE was developed by [Antoine Caillon](https://caillonantoine.github.io/) and the [ACIDS team at IRCAM](https://www.ircam.fr/). Paper: [arXiv:2111.05011](https://arxiv.org/abs/2111.05011). Upstream code: [acids-ircam/RAVE](https://github.com/acids-ircam/RAVE). ## License **CC-BY-NC-4.0** — non-commercial use only, inherited from the upstream distributions. Generated audio is fine for non-commercial use. Commercial use of the *models themselves* (e.g. shipping them inside a paid product) requires permission from the original authors / IRCAM. Per MAESTRO's stance (see `LICENSE_AUDIT.md` and the `feedback_download_on_demand_licensing` memory), these weights are fetched *on demand* by the end user — the user (not MAESTRO the binary) is the licensee. --- ## Models — IIL-curated set (b2048 streaming exports, 18 models) Each `.ts` checkpoint has a `.json` sidecar with name, license, sample-rate, latent-dim, source URL, and a one-line description. ### Voice / speech - `voice_vocalset_b2048_r48000_z16.ts` — **Voice (VocalSet)**. Voice timbre trained on the VocalSet corpus — covers vocal techniques across multiple singers. Use for the canonical 'make this sound like a voice' transfer. - `voice-multi-b2048-r48000-z11.ts` — **Voice (Multi-speaker)**. Aggregated multi-speaker voice corpus. Wider speaker diversity than VocalSet — produces more 'average human' renders. - `voice_hifitts_b2048_r48000_z16.ts` — **Voice (HiFi-TTS)**. High-fidelity expressive English speech corpus. Cleaner, more articulate than the multi-speaker model. - `voice_jvs_b2048_r44100_z16.ts` — **Voice (JVS, Japanese)**. JVS Japanese multi-speaker corpus at 44.1 kHz. Use for Japanese-language sources or non-Latin phoneme structure. - `voice_vctk_b2048_r44100_z22.ts` — **Voice (VCTK, English)**. VCTK English multi-speaker corpus from CSTR Edinburgh, 44.1 kHz. High 22-dim latent — captures more speaker idiosyncrasies. ### Bird / wildlife - `birds_motherbird_b2048_r48000_z16.ts` — **Birds (Motherbird)**. Bird-vocalization corpus — chirps + textural transients. The canonical 'weird' pick: produces wildly warped output for any arbitrary input. - `birds_dawnchorus_b2048_r48000_z8.ts` — **Birds (Dawn Chorus)**. Dense overlapping bird vocalizations recorded at dawn. Smaller 8-dim latent — outputs lean ensemble-textural over individual calls. - `birds_pluma_b2048_r48000_z12.ts` — **Birds (Pluma)**. Lighter, individual bird-call timbres. Mid-size 12-dim latent balances character + clarity. - `humpbacks_pondbrain_b2048_r48000_z20.ts` — **Humpback Whales**. Humpback-whale song. Long, slow, hauntingly-deep vocal contours — pairs well with sustained input. - `marinemammals_pondbrain_b2048_r48000_z20.ts` — **Marine Mammals**. Mixed marine-mammal vocalizations — dolphins, orcas, sea-life clicks and cries. ### Instruments - `guitar_iil_b2048_r48000_z16.ts` — **Guitar (IIL)**. Acoustic / electric guitar timbre. Good demo for transferring voice or synth input into a plucked-string voice. - `organ_bach_b2048_r48000_z16.ts` — **Organ (Bach)**. Pipe-organ timbre trained on Bach repertoire. Sustained harmonic textures — pairs well with melodic input. - `organ_archive_b2048_r48000_z16.ts` — **Organ (Archive)**. Historical pipe-organ recordings — broader, dustier textures than the Bach model. Good for film-score atmospheres. - `sax_soprano_franziskaschroeder_b2048_r48000_z20.ts` — **Soprano Sax (Schroeder)**. Soprano-saxophone extended techniques by Franziska Schroeder. Multiphonics, growls, key clicks. 20-dim latent — captures fine-grained articulation. - `mrp_strengjavera_b2048_r44100_z16.ts` — **Magnetic Resonator Piano (Strengjavera)**. Sustained metallic-string overtones produced by electromagnetically driving piano strings — 44.1 kHz. - `crozzoli_bigensemblesmusic_18d.ts` — **Big Ensemble Music (Crozzoli)**. Big-ensemble orchestral music (M. Crozzoli). Broad 18-dim latent for hugely-textured renders. Sample rate not embedded in filename — defaults to 48 kHz. ### Textures / environment - `water_pondbrain_b2048_r48000_z16.ts` — **Water (PondBrain)**. Water / aquatic textures. Treats any input as if it were running through liquid — bubbles, ripples, splashes. - `magnets_b2048_r48000_z8.ts` — **Magnets**. Ferromagnetic / electromagnetic resonance textures — metallic hums, distant industrial buzz, magnetized-string ringing. --- ## Models — ACIDS public catalog (10 models, mirrored 2026-05-18) Pulled from the canonical anonymous-download endpoint `https://play.forum.ircam.fr/rave-vst-api/get_model/`. Each `.ts` has a matching `.json` sidecar in the same schema as the IIL set. | Slug | Display name | Type | Author | Year | Size | Prior | |---|---|---|---|---|---|---| | `VCTK` | VCTK (English Speech) | RAVE v1 (default) | Jb Dupuy | 2022 | 177 MB | ✓ | | `darbouka_onnx` | Darbouka (Percussion) | RAVE v2 (ONNX) | Antoine Caillon | 2022 | 26 MB | – | | `nasa` | NASA Apollo 11 | RAVE v1 (default) | Antoine Caillon | 2022 | 159 MB | ✓ | | `percussion` | Percussion (Mixed) | RAVE v1 (default) | Antoine Caillon | 2022 | 71 MB | ✓ | | `vintage` | Vintage Music | RAVE v1 (large) | Antoine Caillon | 2022 | 482 MB | ✓ | | `isis` | ISiS (IRCAM Vocal DB) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – | | `musicnet` | MusicNet (Classical) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 237 MB | ✓ | | `sol_ordinario` | Studio OnLine (Ordinario) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – | | `sol_full` | Studio OnLine (Full) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – | | `sol_ordinario_fast` | Studio OnLine (Ordinario, fast) | RAVE v2 (small) | A. Chemla–Romeu-Santos | 2023 | 43 MB | – | **ACIDS set total: ~1.6 GB across 10 models.** > Note: `VCTK.ts` (ACIDS v1, 48 kHz, original 2022 release) and `voice_vctk_b2048_r44100_z22.ts` > (IIL v2 retrain, 44.1 kHz) are *different* models trained on the same source corpus — > keep both for comparison. --- ## File format Each `*.ts` is a [TorchScript](https://pytorch.org/docs/stable/jit.html) export of the RAVE model, streaming-mode (causal convolutions, cached state) — ready for realtime or offline inference. ```python import torch model = torch.jit.load("vintage.ts") # Encode (B, 1, T) → latents z = model.encode(audio) # Decode latents → audio y = model.decode(z) ``` Models with "Prior available" additionally ship a learned prior that can generate latents autoregressively (see the [RAVE repo](https://github.com/acids-ircam/RAVE) for usage). ## Where to find more RAVE models - [Neutone FX models](https://neutone.ai/fx/models) — community + curated `.nm` files (the Neutone wrapper format). - [IRCAM Forum projects](https://forum.ircam.fr/) — individual user-submitted models; many require Forum account. - [acids-ircam GitHub releases](https://github.com/acids-ircam/RAVE/releases) — reference checkpoints from the maintainers. - [IRCAM RAVE Model Challenge 2025](https://forum.ircam.fr/collections/detail/rave-model-challenge-models/) — 11 prize-winner / submission models gated behind a Forum account. ## Citation ```bibtex @inproceedings{caillon2021rave, title={RAVE: A variational autoencoder for fast and high-quality neural audio synthesis}, author={Caillon, Antoine and Esling, Philippe}, booktitle={arXiv preprint arXiv:2111.05011}, year={2021} } ```