How to create new voice styles + voice cloning support?

#3
by cybowolf - opened

Hi team,
I'm exploring Supertonic TTS and I saw the voice_styles/*.json files (example: F1.json). I want to create my own custom voice style for a new voice profile.

How can we generate a new voice-style JSON?

Is there a documented format or schema?

Are these styles tied to a specific embedding space inside the model?

Can external users add/register new styles without retraining the model?

Does the open-source Supertonic model support voice cloning?

If yes, what is the recommended dataset + process?

If not, is cloning only available in Supertone Play/API?

Will fine-tuning or speaker adaptation be supported in future releases?

I want to build custom voices and custom styles for my project, so any guidance or examples would be greatly appreciated.

Thanks!

Thank you for your interest!
We're actively building a pipeline to let users incorporate their preferred voices into the open-source model, with the goal of releasing it before the end of the year.
Supertone Play/API offer high-quality voice cloning. However, they do not generate the JSON files required for use with the open-source model.

Thank you so much for the clarification!

That’s great to hear a user-friendly pipeline for adding custom voices to the open-source model will be incredibly valuable, and I’m looking forward to the release later this year.
I appreciate the transparency and the progress you’re making. The open-source community will definitely benefit from this upcoming feature. Looking forward to updates!

That's crazy if only you guys didn't intentionally leave it out. https://huggingface.co/Supertone/supertonic/blob/main/.gitignore#L3

image

Sign up or log in to comment