FluidInference/kokoro-82m-coreml

Oct 28, 2025

Is this model ANE optimized?

Fluid Inference Community org Oct 28, 2025

No yet, first version was just trying to get it running on CoreML end to end. @alexwengg is working on improving the model now

test2342542

Oct 28, 2025

Okay great news then :). btw a Xcode Core ML Performance Report screenshot would be interesting in the model card since it is core ml format.

xun

Nov 8, 2025

This comment has been hidden (marked as Off-Topic)

xun

Nov 8, 2025

No yet, first version was just trying to get it running on CoreML end to end. @alexwengg is working on improving the model now

I found a useful code repository management tool, although I haven't been able to get it working: https://github.com/mattmireles/kokoro-coreml

xun

Nov 8, 2025

https://huggingface.co/xun/kokoro-v1.1-zh-onnx ,This ONNX version can also be tested; it also has an 8-bit quantized version.

alexwengg

Fluid Inference Community org Mar 6

@xun we have pocket tts that works better on ios. ANE might not be fully optimized for TTS architectures

laishere

17 days ago

I managed to optimize kokoro for ANE with size about 80MB https://github.com/laishere/kokoro-coreml

alexwengg

Fluid Inference Community org 17 days ago

@laishere nice nice, very nice. i am impressed by what you were able to accomplish. i will take a look later. would you like to include this in Fluidaudio as well.

laishere

17 days ago

@alexwengg Thanks! Feel free to grab anything useful from the repo.

alexwengg

Fluid Inference Community org 17 days ago

@laishere i would like you to be properly credited if possible could you make a mobius PR , huggingface PR and a optional fluid audio PR too.

laishere

16 days ago

@alexwengg appreciate the offer! unfortunately I'm tied up with other things right now and won't have enough time recently.

alexwengg

Fluid Inference Community org 16 days ago

@laishere thats fine i will get the code pushed and credit you as well. good job on the repo. let me know what other future projects you will make

laishere

16 days ago

•

edited 16 days ago

@alexwengg thanks! I start the kokoro coreml project because an iOS app of mine needs it. I don't have any future projects in mind right now.
but based on my experience, in the future I might train a new small model like kokoro addressing some headaches I met:

streaming architecture for fast first utterance response time and better performance handling long paragraph
fixed shape input for better ANE memory optimization

alexwengg

Fluid Inference Community org 16 days ago

@laishere have you tried style tts or kitten tts

laishere

16 days ago

@alexwengg i didn't try other style tts models except the kokoro. i tried the kitten tts in the public space, not very impressive.

alexwengg

Fluid Inference Community org 16 days ago

I see there’s also magpie tts if you wanna take a look. I am still converting magpie tts but I haven’t had the time for style tts

laishere

16 days ago

got it, thanks for bringing them up

alexwengg

Fluid Inference Community org 15 days ago

@test2342542 ANE is now supported thanks to @laishere

alexwengg changed discussion status to closed 15 days ago

alexwengg

Fluid Inference Community org 11 days ago

•

edited 11 days ago

@laishere i am curious what made you decided to go with the 7 model routes. i was aware of this feasibility however i had concluded the fragmentations of the model into several sub models would have degraded the speed gained from ANE due to the swift to mlmodelc transitions. output and input transfer can be quite the inference cost.

not to even mention the small of the total model being under 1 gb which is relatively small

alexwengg

Fluid Inference Community org 11 days ago

also do you have any social media accounts we could resume this communication. i would like to dig more into how you achieved this ANE break through

alexwengg

Fluid Inference Community org 11 days ago

any reason why albert, postalbert, alignment were not fused if they used ANE & CPU. while prosody + noise uses GPU & CPU too

laishere

11 days ago

good question. postalbert has lstm ops, which is unsupported on ANE, but fusing might still be possible (i used to treat them as a single encoder in my earlier tests and seems performed well too).
and fusing postalbert and alignment is likely possible, since they have the same config (fp16 + all units).
noise runs in fp32 which is unsupported on ANE. cannot fuse it with prosody without breaking the ANE and most of the prosody ops run on ANE.
anyway, I splitted the model into small pieces because it's easier to make it schedule most work to ANE. but I think fusing is likely possible.
my email laishereu@gmail.com

alexwengg

Fluid Inference Community org 11 days ago

so in theory its possible but the ANE scheduler is a blackbox issue.
what was the procedure you used for testing this . did you ended up breaking up the models based on their key pytorch modules (ignoring the vocoder and KokoroTail models ) or was the break up largely from experimentation

laishere

11 days ago

yeah, in theory as long as it's fp16, the ANE compatible ops are supposed to run on ANE. but the scheduler...
my procedure is to split the pipeline into smaller stages to check where's the bottleneck. the original modules are clean boundaries.
but sometimes inside a single module, we still need to split further to isolate the ANE or quality issues.

alexwengg

Fluid Inference Community org 11 days ago

i see so a good stra would be to split the mlpackages by pytorch modules and slowly merge any possible paths.

i think what you have done is quite impressive. how did you arrive at the cos solution and pinpointed the sin causing Compounding Errors.

alexwengg

Fluid Inference Community org 11 days ago

any reasons why Noise and Tail models were unable to be fp 16. what was so unique about them you needed them to be f32

laishere

11 days ago

•

edited 11 days ago

noise and tail in fp16 will degrade the audio quality, for example, the tail will amplify the errors in fp16, resulting in high frequency noise for some utterances

alexwengg

Fluid Inference Community org 11 days ago

i woud have thought the prosody model would have been more senitive to change not noise

laishere

11 days ago

the noise module contains SineGen which uses cumsum op, not just pure noise

alexwengg

Fluid Inference Community org 11 days ago

Not sure how that would be a serious issue since my impression is that exponent operations are the main causes of quality degradation .

laishere

11 days ago

you could run noise mlpackage with fp16 to compare and verify. I think cumsum can easily hit numerical errors in fp16

alexwengg

Fluid Inference Community org 11 days ago

I see it was an experimental finding.

alexwengg

Fluid Inference Community org 11 days ago

Is this your first time finetuning a model or have you done more

laishere

10 days ago

it's my first time

alexwengg

Fluid Inference Community org 10 days ago

honestly impressive. did you have a math background here .

laishere

10 days ago

no, just a normal CS undergraduate. I just do and learn this project with the help of AI - the only thing I need to know is how to address those engineering problems which I already know as a software engineer.

alexwengg

Fluid Inference Community org 10 days ago

I like what you have done here. I will try to keep an eye out on what you will do in the future

alexwengg

Fluid Inference Community org 9 days ago

@laishere how long did this project took to complete. was it a part time work as well.

i have also finally got to port the Mandarin model thanks to your assistance
https://github.com/FluidInference/FluidAudio/pull/570/changes

laishere

9 days ago

about a week including the distillation attempt. yes, it's part time work.

FluidInference
/

kokoro-82m-coreml

ANE support?