Instructions to use timofeiiz/soundstream-impl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use timofeiiz/soundstream-impl with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("timofeiiz/soundstream-impl", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 1,521 Bytes
8d545cf 939b5d8 8d545cf 939b5d8 8d545cf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ---
library_name: transformers
tags:
- audio
- soundstream
license: mit
language:
- en
pipeline_tag: audio-to-audio
---
# SoundStream
A PyTorch implementation of the [SoundStream](https://arxiv.org/abs/2107.03312) neural audio codec. Accepts only 16 kHz audio.
Encodes speech into discrete tokens (8 codebooks × 80 tokens/sec) and decodes them back to audio.
## Metrics
Evaluated on LibriSpeech test-clean:
- **STOI**: 0.804
- **NISQA**: 2.276
## Architecture
- **Encoder**: Causal convolutions with residual units and strided downsampling (2, 4, 5, 5 = 200x compression)
- **Quantizer**: Residual Vector Quantizer with 8 codebooks of 1024 entries each
- **Decoder**: Mirrored encoder with transposed convolutions
- **Discriminator** (training only): 3 multi-scale waveform discriminators + 1 STFT-based discriminator
**Model parameters**: 16 kHz, 32 channels, latent dim 512, codebook size 1024, 8 quantizers, 200x downsampling
## Usage
```python
import torchaudio
from transformers import AutoModel
# Load model
model = AutoModel.from_pretrained("timofeiiz/soundstream-impl", trust_remote_code=True)
model.eval()
waveform, sr = torchaudio.load("audio.wav")
assert sr == 16000 # Only 16 kHz sample rate is supported
# Encode to discrete tokens
indices = model.encode(waveform.unsqueeze(0)) # (1, 8, T)
# Decode back to audio
reconstructed = model.decode(indices, original_length=waveform.size(-1))
torchaudio.save("reconstructed.wav", reconstructed.squeeze(0).cpu(), 16000)
```
## License
MIT
|