File size: 8,161 Bytes
8f1acc9 a55056c 8f1acc9 a55056c 8f1acc9 a55056c 8f1acc9 a55056c 8f1acc9 a55056c 38455ae 8f1acc9 38455ae 8f1acc9 38455ae a55056c 38455ae 8f1acc9 a55056c 38455ae a55056c 8f1acc9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ---
license: cc-by-nc-4.0
tags:
- audio
- rave
- timbre-transfer
- neural-synthesis
- ircam
- maestro
language:
- en
pipeline_tag: audio-to-audio
---
# RAVE — AEmotionStudio mirror
Curated mirror of public **RAVE** (Realtime Audio Variational autoEncoder) checkpoints, used
by MAESTRO's RAVE Timbre Transfer panel (opt-in starter pack). Sources:
- The [Intelligent-Instruments-Lab/rave-models](https://huggingface.co/Intelligent-Instruments-Lab/rave-models) curated set (birds, voices, organs, water, etc.).
- The [official ACIDS-IRCAM public catalog](https://acids-ircam.github.io/rave_models_download.html), pulled from the canonical anonymous API at `https://play.forum.ircam.fr/rave-vst-api/get_available_models`.
RAVE was developed by [Antoine Caillon](https://caillonantoine.github.io/) and the
[ACIDS team at IRCAM](https://www.ircam.fr/). Paper: [arXiv:2111.05011](https://arxiv.org/abs/2111.05011).
Upstream code: [acids-ircam/RAVE](https://github.com/acids-ircam/RAVE).
## License
**CC-BY-NC-4.0** — non-commercial use only, inherited from the upstream distributions.
Generated audio is fine for non-commercial use. Commercial use of the *models themselves*
(e.g. shipping them inside a paid product) requires permission from the original authors / IRCAM.
Per MAESTRO's stance (see `LICENSE_AUDIT.md` and the `feedback_download_on_demand_licensing`
memory), these weights are fetched *on demand* by the end user — the user (not MAESTRO the
binary) is the licensee.
---
## Models — IIL-curated set (b2048 streaming exports, 18 models)
Each `.ts` checkpoint has a `<stem>.json` sidecar with name, license, sample-rate, latent-dim,
source URL, and a one-line description.
### Voice / speech
- `voice_vocalset_b2048_r48000_z16.ts` — **Voice (VocalSet)**. Voice timbre trained on the VocalSet corpus — covers vocal techniques across multiple singers. Use for the canonical 'make this sound like a voice' transfer.
- `voice-multi-b2048-r48000-z11.ts` — **Voice (Multi-speaker)**. Aggregated multi-speaker voice corpus. Wider speaker diversity than VocalSet — produces more 'average human' renders.
- `voice_hifitts_b2048_r48000_z16.ts` — **Voice (HiFi-TTS)**. High-fidelity expressive English speech corpus. Cleaner, more articulate than the multi-speaker model.
- `voice_jvs_b2048_r44100_z16.ts` — **Voice (JVS, Japanese)**. JVS Japanese multi-speaker corpus at 44.1 kHz. Use for Japanese-language sources or non-Latin phoneme structure.
- `voice_vctk_b2048_r44100_z22.ts` — **Voice (VCTK, English)**. VCTK English multi-speaker corpus from CSTR Edinburgh, 44.1 kHz. High 22-dim latent — captures more speaker idiosyncrasies.
### Bird / wildlife
- `birds_motherbird_b2048_r48000_z16.ts` — **Birds (Motherbird)**. Bird-vocalization corpus — chirps + textural transients. The canonical 'weird' pick: produces wildly warped output for any arbitrary input.
- `birds_dawnchorus_b2048_r48000_z8.ts` — **Birds (Dawn Chorus)**. Dense overlapping bird vocalizations recorded at dawn. Smaller 8-dim latent — outputs lean ensemble-textural over individual calls.
- `birds_pluma_b2048_r48000_z12.ts` — **Birds (Pluma)**. Lighter, individual bird-call timbres. Mid-size 12-dim latent balances character + clarity.
- `humpbacks_pondbrain_b2048_r48000_z20.ts` — **Humpback Whales**. Humpback-whale song. Long, slow, hauntingly-deep vocal contours — pairs well with sustained input.
- `marinemammals_pondbrain_b2048_r48000_z20.ts` — **Marine Mammals**. Mixed marine-mammal vocalizations — dolphins, orcas, sea-life clicks and cries.
### Instruments
- `guitar_iil_b2048_r48000_z16.ts` — **Guitar (IIL)**. Acoustic / electric guitar timbre. Good demo for transferring voice or synth input into a plucked-string voice.
- `organ_bach_b2048_r48000_z16.ts` — **Organ (Bach)**. Pipe-organ timbre trained on Bach repertoire. Sustained harmonic textures — pairs well with melodic input.
- `organ_archive_b2048_r48000_z16.ts` — **Organ (Archive)**. Historical pipe-organ recordings — broader, dustier textures than the Bach model. Good for film-score atmospheres.
- `sax_soprano_franziskaschroeder_b2048_r48000_z20.ts` — **Soprano Sax (Schroeder)**. Soprano-saxophone extended techniques by Franziska Schroeder. Multiphonics, growls, key clicks. 20-dim latent — captures fine-grained articulation.
- `mrp_strengjavera_b2048_r44100_z16.ts` — **Magnetic Resonator Piano (Strengjavera)**. Sustained metallic-string overtones produced by electromagnetically driving piano strings — 44.1 kHz.
- `crozzoli_bigensemblesmusic_18d.ts` — **Big Ensemble Music (Crozzoli)**. Big-ensemble orchestral music (M. Crozzoli). Broad 18-dim latent for hugely-textured renders. Sample rate not embedded in filename — defaults to 48 kHz.
### Textures / environment
- `water_pondbrain_b2048_r48000_z16.ts` — **Water (PondBrain)**. Water / aquatic textures. Treats any input as if it were running through liquid — bubbles, ripples, splashes.
- `magnets_b2048_r48000_z8.ts` — **Magnets**. Ferromagnetic / electromagnetic resonance textures — metallic hums, distant industrial buzz, magnetized-string ringing.
---
## Models — ACIDS public catalog (10 models, mirrored 2026-05-18)
Pulled from the canonical anonymous-download endpoint `https://play.forum.ircam.fr/rave-vst-api/get_model/<slug>`.
Each `.ts` has a matching `<slug>.json` sidecar in the same schema as the IIL set.
| Slug | Display name | Type | Author | Year | Size | Prior |
|---|---|---|---|---|---|---|
| `VCTK` | VCTK (English Speech) | RAVE v1 (default) | Jb Dupuy | 2022 | 177 MB | ✓ |
| `darbouka_onnx` | Darbouka (Percussion) | RAVE v2 (ONNX) | Antoine Caillon | 2022 | 26 MB | – |
| `nasa` | NASA Apollo 11 | RAVE v1 (default) | Antoine Caillon | 2022 | 159 MB | ✓ |
| `percussion` | Percussion (Mixed) | RAVE v1 (default) | Antoine Caillon | 2022 | 71 MB | ✓ |
| `vintage` | Vintage Music | RAVE v1 (large) | Antoine Caillon | 2022 | 482 MB | ✓ |
| `isis` | ISiS (IRCAM Vocal DB) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – |
| `musicnet` | MusicNet (Classical) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 237 MB | ✓ |
| `sol_ordinario` | Studio OnLine (Ordinario) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – |
| `sol_full` | Studio OnLine (Full) | RAVE v2 | A. Chemla–Romeu-Santos | 2023 | 149 MB | – |
| `sol_ordinario_fast` | Studio OnLine (Ordinario, fast) | RAVE v2 (small) | A. Chemla–Romeu-Santos | 2023 | 43 MB | – |
**ACIDS set total: ~1.6 GB across 10 models.**
> Note: `VCTK.ts` (ACIDS v1, 48 kHz, original 2022 release) and `voice_vctk_b2048_r44100_z22.ts`
> (IIL v2 retrain, 44.1 kHz) are *different* models trained on the same source corpus —
> keep both for comparison.
---
## File format
Each `*.ts` is a [TorchScript](https://pytorch.org/docs/stable/jit.html) export of the RAVE model,
streaming-mode (causal convolutions, cached state) — ready for realtime or offline inference.
```python
import torch
model = torch.jit.load("vintage.ts")
# Encode (B, 1, T) → latents
z = model.encode(audio)
# Decode latents → audio
y = model.decode(z)
```
Models with "Prior available" additionally ship a learned prior that can generate latents
autoregressively (see the [RAVE repo](https://github.com/acids-ircam/RAVE) for usage).
## Where to find more RAVE models
- [Neutone FX models](https://neutone.ai/fx/models) — community + curated `.nm` files (the Neutone wrapper format).
- [IRCAM Forum projects](https://forum.ircam.fr/) — individual user-submitted models; many require Forum account.
- [acids-ircam GitHub releases](https://github.com/acids-ircam/RAVE/releases) — reference checkpoints from the maintainers.
- [IRCAM RAVE Model Challenge 2025](https://forum.ircam.fr/collections/detail/rave-model-challenge-models/) — 11 prize-winner / submission models gated behind a Forum account.
## Citation
```bibtex
@inproceedings{caillon2021rave,
title={RAVE: A variational autoencoder for fast and high-quality neural audio synthesis},
author={Caillon, Antoine and Esling, Philippe},
booktitle={arXiv preprint arXiv:2111.05011},
year={2021}
}
```
|