metadata
tags:
- ml-intern
Astra-TTS Architecture
Architecture design documents for Astra-TTS — a lightweight, high-quality text-to-speech system based on ZipVoice/Zipformer.
Documents
| File | Description |
|---|---|
model_a_slim.md |
Model A — ZipVoice naively shrunk to ~55M params. Serves as baseline. |
model_b_enhanced.md |
Model B — ~55M params with architectural improvements (GQA, DepthSep Conv, Grouped Param Sharing, Dilated ConvNeXt, RoPE, etc.) + inference optimizations (EPSS, Midpoint ODE, SmoothCache). |
benchmark_prd.md |
Benchmark PRD — Full evaluation protocol comparing Original ZipVoice (123M) vs Model A (55M) vs Model B (55M) on LibriTTS. |
Goal
Determine whether smart architectural changes at ~55M params can match or exceed a naive shrink, while enabling 6-8× faster inference through combined architecture + inference-time optimizations.
Architecture Summary
| Original ZipVoice | Model A (Slim) | Model B (Enhanced) | |
|---|---|---|---|
| Params | 123M | ~55M | ~55M |
| Approach | Full size | Naive shrink | Smart redesign |
| Key changes | — | Smaller dims/fewer layers | GQA, DepthSep FFN, Grouped Sharing, Dilated ConvNeXt, RoPE, ConvNeXt text refinement, no NLA |
| Inference | Euler 16 NFE | Euler 16 NFE | Midpoint 4-step + EPSS + SmoothCache |
| Expected speed | 1× | ~1.5× | ~6-8× |
References
- ZipVoice: arXiv:2506.13053
- Zipformer: arXiv:2310.11230
- Supertonic 3: Supertone/supertonic-3
- F5-TTS: arXiv:2410.06885
License
Apache-2.0
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Praha-Labs/Astra-TTS-Arch"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.