E2-F5 TTS (zh_onnx)
Browse files
zh/F5-TTS-Faster/ckpts/vocos-mel-24khz/DS_Store
ADDED
|
Binary file (6.15 kB). View file
|
|
|
zh/F5-TTS-Faster/ckpts/vocos-mel-24khz/README.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
|
| 6 |
+
|
| 7 |
+
[Audio samples](https://charactr-platform.github.io/vocos/) |
|
| 8 |
+
Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)
|
| 9 |
+
|
| 10 |
+
Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative
|
| 11 |
+
Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical
|
| 12 |
+
GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral
|
| 13 |
+
coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
|
| 14 |
+
|
| 15 |
+
## Installation
|
| 16 |
+
|
| 17 |
+
To use Vocos only in inference mode, install it using:
|
| 18 |
+
|
| 19 |
+
```bash
|
| 20 |
+
pip install vocos
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
If you wish to train the model, install it with additional dependencies:
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
pip install vocos[train]
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
## Usage
|
| 30 |
+
|
| 31 |
+
### Reconstruct audio from mel-spectrogram
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
import torch
|
| 35 |
+
|
| 36 |
+
from vocos import Vocos
|
| 37 |
+
|
| 38 |
+
vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
|
| 39 |
+
|
| 40 |
+
mel = torch.randn(1, 100, 256) # B, C, T
|
| 41 |
+
audio = vocos.decode(mel)
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
Copy-synthesis from a file:
|
| 45 |
+
|
| 46 |
+
```python
|
| 47 |
+
import torchaudio
|
| 48 |
+
|
| 49 |
+
y, sr = torchaudio.load(YOUR_AUDIO_FILE)
|
| 50 |
+
if y.size(0) > 1: # mix to mono
|
| 51 |
+
y = y.mean(dim=0, keepdim=True)
|
| 52 |
+
y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)
|
| 53 |
+
y_hat = vocos(y)
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
## Citation
|
| 57 |
+
|
| 58 |
+
If this code contributes to your research, please cite our work:
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
@article{siuzdak2023vocos,
|
| 62 |
+
title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
|
| 63 |
+
author={Siuzdak, Hubert},
|
| 64 |
+
journal={arXiv preprint arXiv:2306.00814},
|
| 65 |
+
year={2023}
|
| 66 |
+
}
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## License
|
| 70 |
+
|
| 71 |
+
The code in this repository is released under the MIT license.
|
zh/F5-TTS-Faster/ckpts/vocos-mel-24khz/config.yaml
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
feature_extractor:
|
| 2 |
+
class_path: vocos.feature_extractors.MelSpectrogramFeatures
|
| 3 |
+
init_args:
|
| 4 |
+
sample_rate: 24000
|
| 5 |
+
n_fft: 1024
|
| 6 |
+
hop_length: 256
|
| 7 |
+
n_mels: 100
|
| 8 |
+
padding: center
|
| 9 |
+
|
| 10 |
+
backbone:
|
| 11 |
+
class_path: vocos.models.VocosBackbone
|
| 12 |
+
init_args:
|
| 13 |
+
input_channels: 100
|
| 14 |
+
dim: 512
|
| 15 |
+
intermediate_dim: 1536
|
| 16 |
+
num_layers: 8
|
| 17 |
+
|
| 18 |
+
head:
|
| 19 |
+
class_path: vocos.heads.ISTFTHead
|
| 20 |
+
init_args:
|
| 21 |
+
dim: 512
|
| 22 |
+
n_fft: 1024
|
| 23 |
+
hop_length: 256
|
| 24 |
+
padding: center
|
zh/F5-TTS-Faster/ckpts/vocos-mel-24khz/pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:97ec976ad1fd67a33ab2682d29c0ac7df85234fae875aefcc5fb215681a91b2a
|
| 3 |
+
size 54365991
|