--- language: - en - hi - te license: mit library_name: chiluka pipeline_tag: text-to-speech tags: - text-to-speech - tts - styletts2 - voice-cloning - multi-language - hindi - english - telugu - multi-speaker - style-transfer --- # Chiluka TTS **Chiluka** (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on [StyleTTS2](https://github.com/yl4579/StyleTTS2) with style transfer from reference audio. ## Available Models | Model | Name | Languages | Speakers | |-------|------|-----------|----------| | **Hindi-English** (default) | `hindi_english` | Hindi, English | 5 | | **Telugu** | `telugu` | Telugu, English | 1 | ## Installation ```bash pip install git+https://github.com/PurviewVoiceBot/chiluka.git # Required system dependency sudo apt-get install espeak-ng # Ubuntu/Debian ``` ## Usage Model weights download automatically on first use. ```python from chiluka import Chiluka # Load Hindi-English model (default) tts = Chiluka.from_pretrained() # Or Telugu model # tts = Chiluka.from_pretrained(model="telugu") wav = tts.synthesize( text="Hello, this is Chiluka speaking!", reference_audio="path/to/reference.wav", language="en-us" ) tts.save_wav(wav, "output.wav") ``` ### Hindi ```python tts = Chiluka.from_pretrained() wav = tts.synthesize( text="नमस्ते, मैं चिलुका बोल रहा हूं", reference_audio="reference.wav", language="hi" ) tts.save_wav(wav, "hindi_output.wav") ``` ### Telugu ```python tts = Chiluka.from_pretrained(model="telugu") wav = tts.synthesize( text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను", reference_audio="reference.wav", language="te" ) tts.save_wav(wav, "telugu_output.wav") ``` ## Streaming Audio For WebRTC, WebSocket, or HTTP streaming: ```python wav = tts.synthesize("Hello!", "reference.wav", language="en-us") # Get audio as bytes (no disk write) mp3_bytes = tts.to_audio_bytes(wav, format="mp3") # requires pydub + ffmpeg wav_bytes = tts.to_audio_bytes(wav, format="wav") pcm_bytes = tts.to_audio_bytes(wav, format="pcm") # raw 16-bit PCM # Stream chunked audio for chunk in tts.synthesize_stream("Hello!", "reference.wav", language="en-us"): websocket.send(chunk) # PCM chunks by default # Stream as MP3 chunks for chunk in tts.synthesize_stream("Hello!", "reference.wav", format="mp3"): response.write(chunk) ``` ## Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `text` | required | Input text to synthesize | | `reference_audio` | required | Path to reference audio for voice style | | `language` | `"en-us"` | espeak-ng language code (see below) | | `alpha` | `0.3` | Acoustic style mixing (0 = reference, 1 = predicted) | | `beta` | `0.7` | Prosodic style mixing (0 = reference, 1 = predicted) | | `diffusion_steps` | `5` | More steps = better quality, slower | | `embedding_scale` | `1.0` | Classifier-free guidance strength | ## Language Codes | Language | Code | Available In | |----------|------|-------------| | English (US) | `en-us` | All models | | English (UK) | `en-gb` | All models | | Hindi | `hi` | `hindi_english` | | Telugu | `te` | `telugu` | ## Architecture - **Text Encoder**: Token embedding + CNN + BiLSTM - **Style Encoder**: Conv2D + Residual blocks (style_dim=128) - **Prosody Predictor**: LSTM-based with AdaIN normalization - **Diffusion Model**: Transformer-based denoiser with ADPM2 sampler - **Decoder**: HiFi-GAN vocoder (upsample rates: 10, 5, 3, 2) - **Pretrained sub-models**: PL-BERT (text), ASR (alignment), JDC (pitch) ## Requirements - Python >= 3.8 - PyTorch >= 1.13.0 - CUDA recommended - espeak-ng - pydub + ffmpeg (only for MP3/OGG streaming) ## Citation Based on StyleTTS2: ```bibtex @inproceedings{li2024styletts, title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models}, author={Li, Yinghao Aaron and Han, Cong and Raber, Vinay S and Mesgarani, Nima}, booktitle={NeurIPS}, year={2024} } ``` ## License MIT License ## Links - **GitHub**: [PurviewVoiceBot/chiluka](https://github.com/PurviewVoiceBot/chiluka) - **HuggingFace**: [Seemanth/chiluka](https://huggingface.co/Seemanth/chiluka)