ANE support?

#1
by test2342542 - opened

Is this model ANE optimized?

Fluid Inference Community org

No yet, first version was just trying to get it running on CoreML end to end. @alexwengg is working on improving the model now

Okay great news then :). btw a Xcode Core ML Performance Report screenshot would be interesting in the model card since it is core ml format.

This comment has been hidden (marked as Off-Topic)

No yet, first version was just trying to get it running on CoreML end to end. @alexwengg is working on improving the model now

I found a useful code repository management tool, although I haven't been able to get it working: https://github.com/mattmireles/kokoro-coreml

https://huggingface.co/xun/kokoro-v1.1-zh-onnx ,This ONNX version can also be tested; it also has an 8-bit quantized version.

Fluid Inference Community org

@xun we have pocket tts that works better on ios. ANE might not be fully optimized for TTS architectures

I managed to optimize kokoro for ANE with size about 80MB https://github.com/laishere/kokoro-coreml

Fluid Inference Community org

@laishere nice nice, very nice. i am impressed by what you were able to accomplish. i will take a look later. would you like to include this in Fluidaudio as well.

@alexwengg Thanks! Feel free to grab anything useful from the repo.

Fluid Inference Community org

@laishere i would like you to be properly credited if possible could you make a mobius PR , huggingface PR and a optional fluid audio PR too.

@alexwengg appreciate the offer! unfortunately I'm tied up with other things right now and won't have enough time recently.

Fluid Inference Community org

@laishere thats fine i will get the code pushed and credit you as well. good job on the repo. let me know what other future projects you will make

@alexwengg thanks! I start the kokoro coreml project because an iOS app of mine needs it. I don't have any future projects in mind right now.
but based on my experience, in the future I might train a new small model like kokoro addressing some headaches I met:

  • streaming architecture for fast first utterance response time and better performance handling long paragraph
  • fixed shape input for better ANE memory optimization
Fluid Inference Community org

@laishere have you tried style tts or kitten tts

@alexwengg i didn't try other style tts models except the kokoro. i tried the kitten tts in the public space, not very impressive.

Fluid Inference Community org

I see there’s also magpie tts if you wanna take a look. I am still converting magpie tts but I haven’t had the time for style tts

got it, thanks for bringing them up

Fluid Inference Community org

@test2342542 ANE is now supported thanks to @laishere

alexwengg changed discussion status to closed
Fluid Inference Community org
edited 10 days ago

@laishere i am curious what made you decided to go with the 7 model routes. i was aware of this feasibility however i had concluded the fragmentations of the model into several sub models would have degraded the speed gained from ANE due to the swift to mlmodelc transitions. output and input transfer can be quite the inference cost.

not to even mention the small of the total model being under 1 gb which is relatively small

Fluid Inference Community org

also do you have any social media accounts we could resume this communication. i would like to dig more into how you achieved this ANE break through

Fluid Inference Community org

any reason why albert, postalbert, alignment were not fused if they used ANE & CPU. while prosody + noise uses GPU & CPU too

good question. postalbert has lstm ops, which is unsupported on ANE, but fusing might still be possible (i used to treat them as a single encoder in my earlier tests and seems performed well too).
and fusing postalbert and alignment is likely possible, since they have the same config (fp16 + all units).
noise runs in fp32 which is unsupported on ANE. cannot fuse it with prosody without breaking the ANE and most of the prosody ops run on ANE.
anyway, I splitted the model into small pieces because it's easier to make it schedule most work to ANE. but I think fusing is likely possible.
my email laishereu@gmail.com

Fluid Inference Community org

so in theory its possible but the ANE scheduler is a blackbox issue.
what was the procedure you used for testing this . did you ended up breaking up the models based on their key pytorch modules (ignoring the vocoder and KokoroTail models ) or was the break up largely from experimentation

yeah, in theory as long as it's fp16, the ANE compatible ops are supposed to run on ANE. but the scheduler...
my procedure is to split the pipeline into smaller stages to check where's the bottleneck. the original modules are clean boundaries.
but sometimes inside a single module, we still need to split further to isolate the ANE or quality issues.

Fluid Inference Community org

i see so a good stra would be to split the mlpackages by pytorch modules and slowly merge any possible paths.

i think what you have done is quite impressive. how did you arrive at the cos solution and pinpointed the sin causing Compounding Errors.

Fluid Inference Community org

any reasons why Noise and Tail models were unable to be fp 16. what was so unique about them you needed them to be f32

noise and tail in fp16 will degrade the audio quality, for example, the tail will amplify the errors in fp16, resulting in high frequency noise for some utterances

Fluid Inference Community org

i woud have thought the prosody model would have been more senitive to change not noise

the noise module contains SineGen which uses cumsum op, not just pure noise

Fluid Inference Community org

Not sure how that would be a serious issue since my impression is that exponent operations are the main causes of quality degradation .

you could run noise mlpackage with fp16 to compare and verify. I think cumsum can easily hit numerical errors in fp16

Fluid Inference Community org

I see it was an experimental finding.

Fluid Inference Community org

Is this your first time finetuning a model or have you done more

it's my first time

Fluid Inference Community org

honestly impressive. did you have a math background here .

no, just a normal CS undergraduate. I just do and learn this project with the help of AI - the only thing I need to know is how to address those engineering problems which I already know as a software engineer.

Fluid Inference Community org

I like what you have done here. I will try to keep an eye out on what you will do in the future

Fluid Inference Community org

@laishere how long did this project took to complete. was it a part time work as well.

i have also finally got to port the Mandarin model thanks to your assistance
https://github.com/FluidInference/FluidAudio/pull/570/changes

about a week including the distillation attempt. yes, it's part time work.

Sign up or log in to comment