Update README.md
Browse files
README.md
CHANGED
|
@@ -99,6 +99,8 @@ with torch.no_grad():
|
|
| 99 |
torchaudio.save("tts.wav", recon[0, :, :], 24_000)
|
| 100 |
```
|
| 101 |
|
|
|
|
|
|
|
| 102 |
### What's to come
|
| 103 |
|
| 104 |
As stated in the model's name, this is a preview model, mainly meant to showcase the capability of the base model.
|
|
@@ -107,8 +109,7 @@ We trained on a small dataset of a single speaker without any special emotion ta
|
|
| 107 |
We are actively working on
|
| 108 |
- multiple speakers with emotional control and nonverbal elements (fillers, laughing, ...)
|
| 109 |
- fine-tuning for general zero-shot voice cloning
|
|
|
|
| 110 |
- post-training with reinforcement learning
|
| 111 |
|
| 112 |
-
Also, we have a fine-tuned version of NeuCodec which we used to generate the speech examples above, which we also plan on realeasing.
|
| 113 |
-
|
| 114 |
Stay tuned - january 2026 is going to be exciting!
|
|
|
|
| 99 |
torchaudio.save("tts.wav", recon[0, :, :], 24_000)
|
| 100 |
```
|
| 101 |
|
| 102 |
+
For even higher fidelity in German speech, use our [finetuned NeuCodec decoder](https://huggingface.co/DigitalLearningGmbH/neucodec-decoder-ft-de).
|
| 103 |
+
|
| 104 |
### What's to come
|
| 105 |
|
| 106 |
As stated in the model's name, this is a preview model, mainly meant to showcase the capability of the base model.
|
|
|
|
| 109 |
We are actively working on
|
| 110 |
- multiple speakers with emotional control and nonverbal elements (fillers, laughing, ...)
|
| 111 |
- fine-tuning for general zero-shot voice cloning
|
| 112 |
+
- phoneme-based / hybrid generation
|
| 113 |
- post-training with reinforcement learning
|
| 114 |
|
|
|
|
|
|
|
| 115 |
Stay tuned - january 2026 is going to be exciting!
|