--- language: - en tags: - tts - text-to-speech - safetensors - cake license: apache-2.0 base_model: YatharthS/LuxTTS --- # LuxTTS (Safetensors / FP16) This is a converted version of [YatharthS/LuxTTS](https://huggingface.co/YatharthS/LuxTTS), a flow-matching based text-to-speech model. All credit for the original model, training, and research goes to the original authors. ## What changed The original PyTorch checkpoint (`model.pt` and `vocoder/vocos.bin`) has been converted to **safetensors** format in **float16** precision for use with [Cake](https://github.com/evilsocket/cake). The conversion applies the following transformations: - **Format**: `.pt` / `.bin` → `.safetensors` (safer, faster loading, memory-mappable). - **Precision**: FP32 → FP16, reducing total size from ~530 MB to ~266 MB. - **Key remapping**: The nested `fm_decoder.encoders.{stack}.layers.{layer}` hierarchy is flattened to `fm_decoder.layers.{flat_index}` using the stack sizes `[2, 2, 4, 4, 4]` (16 layers total). Similarly, `text_encoder.encoders.0.layers` is flattened to `text_encoder.layers`. Per-stack components (`time_emb`, `downsample`, `out_combiner`) are reorganized under `fm_decoder.stack_time_emb`, `fm_decoder.downsample`, and `fm_decoder.out_combiner` respectively. - **Config**: `architectures` field and feature extraction parameters (`n_fft`, `hop_length`, `n_mels`, `sample_rate`) are added to `config.json`. No weights were retrained or fine-tuned — this is a lossless format conversion (modulo FP32→FP16 quantization). ## Model details | Component | File | Size | |---|---|---| | Main model (flow-matching decoder + text encoder) | `model.safetensors` | 235 MB | | Vocoder (Vocos) | `vocos.safetensors` | 31 MB | - **Architecture**: Flow-matching TTS with conformer-based decoder (16 layers across 5 stacks) and 4-layer text encoder - **Vocoder**: Vocos (iSTFT-based, 8 layers, 512 dim) - **Sample rate**: 24 kHz (with 48 kHz upsampler head) - **Vocabulary**: 360 tokens (characters + punctuation) ## Original project - **Model**: [YatharthS/LuxTTS](https://huggingface.co/YatharthS/LuxTTS) - **License**: Apache 2.0