information about training
hey, i absolutely loved the model!! i just wanted to know how did u create the small/distill version of the model!! any specific technique you used for it or some wayarounds !! we're planning to distill a TTS model and would love any tips/ideas you might have β anything works!!
Thank you!
To be honest, the small version wasn't created using distillation. It was just simple SFT using the exact same training dataset as this model.
That being said, I do believe distillation can be effective. However, please note that it generally requires the models to be from the same family to work properly (due to things like tokenizer compatibility).
Thank you!
To be honest, the small version wasn't created using distillation. It was just simple SFT using the exact same training dataset as this model.
That being said, I do believe distillation can be effective. However, please note that it generally requires the models to be from the same family to work properly (due to things like tokenizer compatibility).
ohh! makes sense!! so we're working on a Orpheus fine tune and well i was thinking of finding some smaller model and swapping the tokenizer and doing logit loss based distillation using kl divergence!! cause i think it will help us have a child model which we can continue training it while training the bigger model !! like co distillation
i'd love to hear what u think about this!!