How can we fine-tune it ?

#10

by parasharshivam246 - opened Jun 30, 2025

Discussion

parasharshivam246

Jun 30, 2025

Is there any way to fine-tune this model with other voices ?

Youcancallme

Jun 30, 2025

•

edited Jul 1, 2025

how to make preprocessing (use SNAC)?

bharathkumarK

Maya Research org Jul 1, 2025

yes you can use SNAC vocoder to preprocess the audio to tokens, depending on the type & rate your audio is in. Convert your text & audio into input_ids, labels &
attention_mask and use Huggingface's Trainer. Choose between full-finetune or LoRA depending on use-case. And that's it. Use the same de-interleaving as mentioned in the model card for inference. Plan to add Callbacks to always check intermediatory audio generations mid-training. As there is no fixed no.of Epochs, batch, my suggestion would be to experiment with various values, based on various factors. We plan on releasing the scripts later. prioritizing somethings for now. you can always experiment.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment