I created a model series, maybe close to, although not really tested on phones but on very old laptops. The latest one is Vocaela-2-500M-1024R2, it has GGUF provided as well. 3x faster than its previous version Vocaela-500M
There is an even smaller 256M version as well Vocaela-2-256M-512R2, speedwise much faster than the 500M ones (4-5x faster than Vocaela-2-500M-1024R2), however, quality not as good as them. Unfortunately, this one has no GGUF yet because LlamaCpp converted GGUF version's output is not consistent with the HF version. I am trying to figure out what happened there.