ml-intern

Astra-TTS Architecture

Architecture design documents for Astra-TTS — a lightweight, high-quality text-to-speech system based on ZipVoice/Zipformer.

Documents

File Description
model_a_slim.md Model A — ZipVoice naively shrunk to ~55M params. Serves as baseline.
model_b_enhanced.md Model B — ~55M params with architectural improvements (GQA, DepthSep Conv, Grouped Param Sharing, Dilated ConvNeXt, RoPE, etc.) + inference optimizations (EPSS, Midpoint ODE, SmoothCache).
benchmark_prd.md Benchmark PRD — Full evaluation protocol comparing Original ZipVoice (123M) vs Model A (55M) vs Model B (55M) on LibriTTS.

Goal

Determine whether smart architectural changes at ~55M params can match or exceed a naive shrink, while enabling 6-8× faster inference through combined architecture + inference-time optimizations.

Architecture Summary

Original ZipVoice Model A (Slim) Model B (Enhanced)
Params 123M ~55M ~55M
Approach Full size Naive shrink Smart redesign
Key changes — Smaller dims/fewer layers GQA, DepthSep FFN, Grouped Sharing, Dilated ConvNeXt, RoPE, ConvNeXt text refinement, no NLA
Inference Euler 16 NFE Euler 16 NFE Midpoint 4-step + EPSS + SmoothCache
Expected speed 1× ~1.5× ~6-8×

References

License

Apache-2.0

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Praha-Labs/Astra-TTS-Arch"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for Praha-Labs/Astra-TTS-Arch