F5-TTS trained on Mongolian speech dataset. Epoch was around 33 when I stopped it and (I initially set 100 but too long) and updates were 51000.
Parameters that I know of:
Base model: F5TTS Base Epochs: 100 Learning rage: 0.000075 Max Gradient Norm: 1 Warmup updates: 57 Batch Size Type: frame Batch size per gpu: 1600 (rtx 3080ti) grad_acc_steps = 1 max_samples = 64 precision: fp16 logger: wandb