How can we fine-tune it ?

#10
by parasharshivam246 - opened

Is there any way to fine-tune this model with other voices ?

how to make preprocessing (use SNAC)?

Maya Research org

yes you can use SNAC vocoder to preprocess the audio to tokens, depending on the type & rate your audio is in. Convert your text & audio into input_ids, labels &
attention_mask and use Huggingface's Trainer. Choose between full-finetune or LoRA depending on use-case. And that's it. Use the same de-interleaving as mentioned in the model card for inference. Plan to add Callbacks to always check intermediatory audio generations mid-training. As there is no fixed no.of Epochs, batch, my suggestion would be to experiment with various values, based on various factors. We plan on releasing the scripts later. prioritizing somethings for now. you can always experiment.

Sign up or log in to comment