| --- | |
| license: llama3.2 | |
| base_model: snorbyte/snorTTS-Indic-v0 | |
| tags: | |
| - text-to-speech | |
| - hindi | |
| - hinglish | |
| - audio-generation | |
| - fine-tuned | |
| - unsloth | |
| language: | |
| - hi | |
| - en | |
| pipeline_tag: text-generation | |
| --- | |
| # Hinglish TTS 3B Model | |
| This is a fine-tuned version of (https://huggingface.co/canopylabs/3b-hi-pretrain-research_release) specialized for Hinglish (Hindi-English mixed) text-to-speech generation. | |
| ## Model Details | |
| - **Base Model**: canopylabs/3b-hi-pretrain-research_release | |
| - **Fine-tuning Method**: LoRA with Unsloth (merged) | |
| - **Languages**: Hindi, English, Hinglish | |
| - **Task**: Text-to-Speech via audio token generation | |
| - **Model Size**: ~3B parameters | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| # Load model and tokenizer | |
| model_name = "Indus-Labs/indus_tts_v3_snor" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype=torch.float16, | |
| device_map="auto" | |
| ) | |
| # Generate text | |
| prompt = "Hello doston, main aapka dost hun" | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=1200) | |
| ``` | |
| ## Fine-tuning Details | |
| - **LoRA Rank**: 64 | |
| - **LoRA Alpha**: 64 | |
| - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | |
| - **Training Framework**: Unsloth | |
| ## Audio-Language-Source | |
| bengali: alivia | |
| bhojpuri: kaajal | |
| kannada: aahna | |
| chattisgarahi: kaashvi | |
| hindi: aditi | |
| telugu: prerna | |
| marathi: saakshi | |
| mathili: manisha | |
| bengali_male: sayan | |
| bhojpuri_male: pawan | |
| hindi_male: arjun | |
| telgu_male: surya | |
| kannada_male: chinmay | |
| marathi_male: anant | |
| hindi_savi | |
| hindi_devi | |
| hinglish_savi | |
| hinglish_devi | |
| english_savi | |
| english_devi | |
| ## Audio Generation | |
| This model generates audio tokens that need to be decoded using a SNAC (Scalable Neural Audio Codec) model: | |
| ```python | |
| from snac import SNAC | |
| # Load SNAC decoder | |
| snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz") | |
| # Process generated tokens to audio codes and decode | |
| # (See full implementation in the original training code) | |
| ``` | |
| ## Limitations | |
| - Requires SNAC model for audio generation | |
| - Optimized for Hinglish content | |
| - May not perform well on pure English or pure Hindi in some cases | |
| ## Citation | |
| If you use this model, please cite the original base model: | |
| ```bibtex | |
| @misc{canopylabs-3b-hi, | |
| title={3B Hindi Pretrained Model}, | |
| author={Canopy Labs}, | |
| year={2024}, | |
| url={https://huggingface.co/snorbyte/snorTTS-Indic-v0} | |
| } | |
| ``` | |