kashif
/

soundstream_mel_decoder

Model card Files Files and versions

soundstream_mel_decoder / README.md

kashif's picture

kashif HF Staff

Update README.md

367eb6e over 3 years ago

|

history blame contribute delete

1.4 kB

	---
	license: apache-2.0
	---

	A [SoundStream](https://arxiv.org/abs/2107.03312) decoder to reconstruct audio from a mel-spectrogram.

	## Overview

	This model is a SoundStream decoder which inverts mel-spectrograms computed with the specific hyperparameters defined in the example below. This model was trained on music data and used in [Multi-instrument Music Synthesis with Spectrogram Diffusion](https://arxiv.org/abs/2206.05408) (ISMIR 2022).

	A typical use-case is to simplify music generation by predicting mel-spectrograms (instead of a raw waveform), and then use this model to reconstruct audio.

	If you use it, please consider citing:

	```bibtex
	@article{zeghidour2021soundstream,
	title={Soundstream: An end-to-end neural audio codec},
	author={Zeghidour, Neil and Luebs, Alejandro and Omran, Ahmed and Skoglund, Jan and Tagliasacchi, Marco},
	journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
	volume={30},
	pages={495--507},
	year={2021},
	publisher={IEEE}
	}
	```

	## Example Use

	```python
	from diffusers import OnnxRuntimeModel


	SAMPLE_RATE = 16000
	N_FFT = 1024
	HOP_LENGTH = 320
	WIN_LENGTH = 640
	N_MEL_CHANNELS = 128
	MEL_FMIN = 0.0
	MEL_FMAX = int(SAMPLE_RATE // 2)
	CLIP_VALUE_MIN = 1e-5
	CLIP_VALUE_MAX = 1e8

	mel = ...

	melgan = OnnxRuntimeModel.from_pretrained("kashif/soundstream_mel_decoder")

	audio = melgan(input_features=mel.astype(np.float32))
	```