Add ACIDS public catalog (10 models, ~1.6 GB)

8f1acc9 verified 14 days ago

8.16 kB

	---
	license: cc-by-nc-4.0
	tags:
	- audio
	- rave
	- timbre-transfer
	- neural-synthesis
	- ircam
	- maestro
	language:
	- en
	pipeline_tag: audio-to-audio
	---

	# RAVE — AEmotionStudio mirror

	Curated mirror of public RAVE (Realtime Audio Variational autoEncoder) checkpoints, used
	by MAESTRO's RAVE Timbre Transfer panel (opt-in starter pack). Sources:

	- The [Intelligent-Instruments-Lab/rave-models](https://huggingface.co/Intelligent-Instruments-Lab/rave-models) curated set (birds, voices, organs, water, etc.).
	- The [official ACIDS-IRCAM public catalog](https://acids-ircam.github.io/rave_models_download.html), pulled from the canonical anonymous API at `https://play.forum.ircam.fr/rave-vst-api/get_available_models`.

	RAVE was developed by [Antoine Caillon](https://caillonantoine.github.io/) and the
	[ACIDS team at IRCAM](https://www.ircam.fr/). Paper: [arXiv:2111.05011](https://arxiv.org/abs/2111.05011).
	Upstream code: [acids-ircam/RAVE](https://github.com/acids-ircam/RAVE).

	## License

	CC-BY-NC-4.0 — non-commercial use only, inherited from the upstream distributions.
	Generated audio is fine for non-commercial use. Commercial use of the models themselves
	(e.g. shipping them inside a paid product) requires permission from the original authors / IRCAM.

	Per MAESTRO's stance (see `LICENSE_AUDIT.md` and the `feedback_download_on_demand_licensing`
	memory), these weights are fetched on demand by the end user — the user (not MAESTRO the
	binary) is the licensee.

	---

	## Models — IIL-curated set (b2048 streaming exports, 18 models)

	Each `.ts` checkpoint has a `<stem>.json` sidecar with name, license, sample-rate, latent-dim,
	source URL, and a one-line description.

	### Voice / speech
	- `voice_vocalset_b2048_r48000_z16.ts` — Voice (VocalSet). Voice timbre trained on the VocalSet corpus — covers vocal techniques across multiple singers. Use for the canonical 'make this sound like a voice' transfer.
	- `voice-multi-b2048-r48000-z11.ts` — Voice (Multi-speaker). Aggregated multi-speaker voice corpus. Wider speaker diversity than VocalSet — produces more 'average human' renders.
	- `voice_hifitts_b2048_r48000_z16.ts` — Voice (HiFi-TTS). High-fidelity expressive English speech corpus. Cleaner, more articulate than the multi-speaker model.
	- `voice_jvs_b2048_r44100_z16.ts` — Voice (JVS, Japanese). JVS Japanese multi-speaker corpus at 44.1 kHz. Use for Japanese-language sources or non-Latin phoneme structure.
	- `voice_vctk_b2048_r44100_z22.ts` — Voice (VCTK, English). VCTK English multi-speaker corpus from CSTR Edinburgh, 44.1 kHz. High 22-dim latent — captures more speaker idiosyncrasies.

	### Bird / wildlife
	- `birds_motherbird_b2048_r48000_z16.ts` — Birds (Motherbird). Bird-vocalization corpus — chirps + textural transients. The canonical 'weird' pick: produces wildly warped output for any arbitrary input.
	- `birds_dawnchorus_b2048_r48000_z8.ts` — Birds (Dawn Chorus). Dense overlapping bird vocalizations recorded at dawn. Smaller 8-dim latent — outputs lean ensemble-textural over individual calls.
	- `birds_pluma_b2048_r48000_z12.ts` — Birds (Pluma). Lighter, individual bird-call timbres. Mid-size 12-dim latent balances character + clarity.
	- `humpbacks_pondbrain_b2048_r48000_z20.ts` — Humpback Whales. Humpback-whale song. Long, slow, hauntingly-deep vocal contours — pairs well with sustained input.
	- `marinemammals_pondbrain_b2048_r48000_z20.ts` — Marine Mammals. Mixed marine-mammal vocalizations — dolphins, orcas, sea-life clicks and cries.

	### Instruments
	- `guitar_iil_b2048_r48000_z16.ts` — Guitar (IIL). Acoustic / electric guitar timbre. Good demo for transferring voice or synth input into a plucked-string voice.
	- `organ_bach_b2048_r48000_z16.ts` — Organ (Bach). Pipe-organ timbre trained on Bach repertoire. Sustained harmonic textures — pairs well with melodic input.
	- `organ_archive_b2048_r48000_z16.ts` — Organ (Archive). Historical pipe-organ recordings — broader, dustier textures than the Bach model. Good for film-score atmospheres.
	- `sax_soprano_franziskaschroeder_b2048_r48000_z20.ts` — Soprano Sax (Schroeder). Soprano-saxophone extended techniques by Franziska Schroeder. Multiphonics, growls, key clicks. 20-dim latent — captures fine-grained articulation.
	- `mrp_strengjavera_b2048_r44100_z16.ts` — Magnetic Resonator Piano (Strengjavera). Sustained metallic-string overtones produced by electromagnetically driving piano strings — 44.1 kHz.
	- `crozzoli_bigensemblesmusic_18d.ts` — Big Ensemble Music (Crozzoli). Big-ensemble orchestral music (M. Crozzoli). Broad 18-dim latent for hugely-textured renders. Sample rate not embedded in filename — defaults to 48 kHz.

	### Textures / environment
	- `water_pondbrain_b2048_r48000_z16.ts` — Water (PondBrain). Water / aquatic textures. Treats any input as if it were running through liquid — bubbles, ripples, splashes.
	- `magnets_b2048_r48000_z8.ts` — Magnets. Ferromagnetic / electromagnetic resonance textures — metallic hums, distant industrial buzz, magnetized-string ringing.

	---

	## Models — ACIDS public catalog (10 models, mirrored 2026-05-18)

	Pulled from the canonical anonymous-download endpoint `https://play.forum.ircam.fr/rave-vst-api/get_model/<slug>`.
	Each `.ts` has a matching `<slug>.json` sidecar in the same schema as the IIL set.

	\| Slug \| Display name \| Type \| Author \| Year \| Size \| Prior \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| `VCTK` \| VCTK (English Speech) \| RAVE v1 (default) \| Jb Dupuy \| 2022 \| 177 MB \| ✓ \|
	\| `darbouka_onnx` \| Darbouka (Percussion) \| RAVE v2 (ONNX) \| Antoine Caillon \| 2022 \| 26 MB \| – \|
	\| `nasa` \| NASA Apollo 11 \| RAVE v1 (default) \| Antoine Caillon \| 2022 \| 159 MB \| ✓ \|
	\| `percussion` \| Percussion (Mixed) \| RAVE v1 (default) \| Antoine Caillon \| 2022 \| 71 MB \| ✓ \|
	\| `vintage` \| Vintage Music \| RAVE v1 (large) \| Antoine Caillon \| 2022 \| 482 MB \| ✓ \|
	\| `isis` \| ISiS (IRCAM Vocal DB) \| RAVE v2 \| A. Chemla–Romeu-Santos \| 2023 \| 149 MB \| – \|
	\| `musicnet` \| MusicNet (Classical) \| RAVE v2 \| A. Chemla–Romeu-Santos \| 2023 \| 237 MB \| ✓ \|
	\| `sol_ordinario` \| Studio OnLine (Ordinario) \| RAVE v2 \| A. Chemla–Romeu-Santos \| 2023 \| 149 MB \| – \|
	\| `sol_full` \| Studio OnLine (Full) \| RAVE v2 \| A. Chemla–Romeu-Santos \| 2023 \| 149 MB \| – \|
	\| `sol_ordinario_fast` \| Studio OnLine (Ordinario, fast) \| RAVE v2 (small) \| A. Chemla–Romeu-Santos \| 2023 \| 43 MB \| – \|

	ACIDS set total: ~1.6 GB across 10 models.

	> Note: `VCTK.ts` (ACIDS v1, 48 kHz, original 2022 release) and `voice_vctk_b2048_r44100_z22.ts`
	> (IIL v2 retrain, 44.1 kHz) are different models trained on the same source corpus —
	> keep both for comparison.

	---

	## File format

	Each `*.ts` is a [TorchScript](https://pytorch.org/docs/stable/jit.html) export of the RAVE model,
	streaming-mode (causal convolutions, cached state) — ready for realtime or offline inference.

	```python
	import torch
	model = torch.jit.load("vintage.ts")
	# Encode (B, 1, T) → latents
	z = model.encode(audio)
	# Decode latents → audio
	y = model.decode(z)
	```

	Models with "Prior available" additionally ship a learned prior that can generate latents
	autoregressively (see the [RAVE repo](https://github.com/acids-ircam/RAVE) for usage).

	## Where to find more RAVE models

	- [Neutone FX models](https://neutone.ai/fx/models) — community + curated `.nm` files (the Neutone wrapper format).
	- [IRCAM Forum projects](https://forum.ircam.fr/) — individual user-submitted models; many require Forum account.
	- [acids-ircam GitHub releases](https://github.com/acids-ircam/RAVE/releases) — reference checkpoints from the maintainers.
	- [IRCAM RAVE Model Challenge 2025](https://forum.ircam.fr/collections/detail/rave-model-challenge-models/) — 11 prize-winner / submission models gated behind a Forum account.

	## Citation

	```bibtex
	@inproceedings{caillon2021rave,
	title={RAVE: A variational autoencoder for fast and high-quality neural audio synthesis},
	author={Caillon, Antoine and Esling, Philippe},
	booktitle={arXiv preprint arXiv:2111.05011},
	year={2021}
	}
	```