| | --- |
| | library_name: transformers |
| | datasets: |
| | - malaysia-ai/Multilingual-TTS |
| | - Scicom-intl/Emilia-YODAS-Voice-Conversion |
| | - Scicom-intl/Malaysian-Emilia |
| | base_model: |
| | - Qwen/Qwen3-4B-Base |
| | language: |
| | - en |
| | - ms |
| | - zh |
| | - ta |
| | --- |
| | |
| | # Multilingual-TTS-4B-Base |
| |
|
| | Continue pretraining [Qwen/Qwen3-4B-Base](https://huggingface.co/Qwen/Qwen3-4B-Base) on Multilingual Voice Conversion and TTS. |
| |
|
| | 1. Use [neucodec](https://huggingface.co/neuphonic/neucodec) as speech detokenizer, 50 TPS, output in 24k sample rate. |
| | 2. Multi-speaker multilingual Voice Conversion, **up to 35.88B tokens**. |
| | 3. Multi-speaker multilingual TTS more than 150 languages, **up to 14.64B tokens**. |
| | 4. Flash Attention 3 10k context length varlen multipacking. |
| | 5. BF16 training. |
| | 6. MuonAdamW optimizer. |