I just released Inflect-Nano-v1, an ultra-small 4.63 parameter text-to-speech model.
The main idea is simple: instead of only making the acoustic model tiny and relying on a larger external vocoder, Inflect-Nano-v1 keeps the complete text-to-waveform stack under 5M parameters.
Quick facts: - 4.63M total inference parameters - 3.46M acoustic model - 1.17M vocoder - 24 kHz audio - English-only - Single male voice - Runs locally with a simple PyTorch inference script
Why I made it: Most modern TTS models are much larger, and even many “small TTS” projects depend on a separate vocoder. I wanted to see how far a complete tiny TTS stack could be pushed while still producing usable speech.
It is not SOTA, and I am not trying to claim it competes with large TTS systems. The interesting part is the size-to-functionality ratio.
What works: It can generate arbitrary English speech locally, and the model is small enough to be interesting for:
- local voice assistants - embedded/edge experiments - browser or WASM-style TTS exploration - efficient inference research - tiny-model baselines
Limitations: The quality is still limited. It can sound robotic, stumble on difficult unseen text, and the vocoder is still a clear bottleneck. Long or unusual prompts are less reliable.
So I would frame this as a research/demo release, not a production TTS engine.
I’d love feedback from people interested in: - tiny speech models - vocoders - local TTS - efficient inference - embedded speech synthesis - improving small-model generalization
If people find it useful, I’m interested in putting more training budget into a stronger v2.
Everyone is born a semiotician, no one is born knowing it. Go easy on yourself (and me) for not understanding this yet.
Computational semiotics is now an empirical study.
LLMs are not proto-minds. They are verifiably semiotic infrastructure.
This repository (or attached demo) can show you, in real time, how any frozen model (Qwen for demo) arrives at any answer by reading its latent states directly during generation.
Grok insist my intro is condescending … This is certainly true, as is the statement in my condescended opinion. I expect heat for it, let’s think this through?