Update README.md

Browse files

Files changed (1) hide show

README.md +84 -0

README.md CHANGED Viewed

	@@ -9,3 +9,87 @@ pipeline_tag: text-to-speech
9
10	Based on the original kokoro model, see https://github.com/FluidInference/FluidAudio for inference
11

 Based on the original kokoro model, see https://github.com/FluidInference/FluidAudio for inference
+## Benchmark
+We generated the same strings with to gerneate audio between 1s to ~300s in order to test the speed across a range of varying inputs on Pytorch CPU, MPS, and MLX pipeline, and compared it against the native Swift version with Core ML models.
+Each pipeline warmed up the models by running through it once with pesudo inputs, and then comparing the raw inference time with the model already loaded. You can see that for the Core ML model, we traded lower memory and very slightly faster inference for longer initial warm-up.
+Note that the Pytorch kokoro model in Pytorch has a memory leak issue: https://github.com/hexgrad/kokoro/issues/152
+The following tests were ran on M4 Pro, 48GB RAM, Macbook Pro. If you have another device, please do try replicating it as well!
+### Kokoro-82M PyTorch (CPU)
+```bash
+KPipeline benchmark for voice af_heart (warm-up took 0.175s) using hexgrad/kokoro
+Test   Chars    Output (s)   Inf(s)       RTFx       Peak GB
+1      42       2.750        0.187        14.737x    1.44
+2      129      8.625        0.530        16.264x    1.85
+3      254      15.525       0.923        16.814x    2.65
+4      93       6.125        0.349        17.566x    2.66
+5      104      7.200        0.410        17.567x    2.70
+6      130      9.300        0.504        18.443x    2.72
+7      197      12.850       0.726        17.711x    2.83
+8      6        1.350        0.098        13.823x    2.83
+9      1228     76.200       4.342        17.551x    3.19
+10     567      35.200       2.069        17.014x    4.85
+11     4615     286.525      17.041       16.814x    4.78
+Total  -        461.650      27.177       16.987x    4.85
+```
+### Kokoro-82M PyTorch (MPS)
+I wasn't able to run the MPS model for longer durations, even with `PYTORCH_ENABLE_MPS_FALLBACK=1` enabled, it kept crashing for the longer strings.
+```bash
+KPipeline benchmark for voice af_heart (warm-up took 0.568s) using pip package
+Test   Chars    Output (s)   Inf(s)       RTFx       Peak GB
+1      42       2.750        0.414        6.649x     1.41
+2      129      8.625        0.729        11.839x    1.54
+Total  -        11.375       1.142        9.960x     1.54
+```
+### Kokoro-82M MLX Pipeline
+```bash
+TTS benchmark for voice af_heart (warm-up took an extra 2.155s) using model prince-canuma/Kokoro-82M
+Test   Chars    Output (s)   Inf(s)       RTFx       Peak GB
+1      42       2.750        0.347        7.932x     1.12
+2      129      8.650        0.597        14.497x    2.47
+3      254      15.525       0.825        18.829x    2.65
+4      93       6.125        0.306        20.039x    2.65
+5      104      7.200        0.343        21.001x    2.65
+6      130      9.300        0.560        16.611x    2.65
+7      197      12.850       0.596        21.573x    2.65
+8      6        1.350        0.364        3.706x     2.65
+9      1228     76.200       2.979        25.583x    3.29
+10     567      35.200       1.374        25.615x    3.37
+11     4615     286.500      11.112       25.783x    3.37
+Total  -        461.650      19.401       23.796x    3.37
+```
+#### Swift + Fluid Audio Core ML models
+Note that it does take `~15s` to compile the model on the first run, subsequent runs are shorter, we expect ~2s to load.
+```bash
+> swift run fluidaudio tts --benchmark
+...
+FluidAudio TTS benchmark for voice af_heart (warm-up took an extra 2.348s)
+Test   Chars    Ouput (s)    Inf(s)       RTFx
+1      42       2.825        0.440        6.424x
+2      129      7.725        0.594        13.014x
+3      254      13.400       0.776        17.278x
+4      93       5.875        0.587        10.005x
+5      104      6.675        0.613        10.889x
+6      130      8.075        0.621        13.008x
+7      197      10.650       0.627        16.983x
+8      6        0.825        0.360        2.290x
+9      1228     67.625       2.362        28.625x
+10     567      33.025       1.341        24.619x
+11     4269     247.600      9.087        27.248x
+Total  -        404.300      17.408       23.225
+Peak memory usage (process-wide): 1.503 GB
+```