ahmad4raza
/

base-model

Model card Files Files and versions

base-model / README.md

ahmad4raza's picture

Upload folder using huggingface_hub

8996c57 verified about 1 month ago

|

History Blame Contribute Delete

2.92 kB

	---
	license: llama3.2
	base_model: snorbyte/snorTTS-Indic-v0
	tags:
	- text-to-speech
	- hindi
	- hinglish
	- audio-generation
	- fine-tuned
	- unsloth
	language:
	- hi
	- en
	pipeline_tag: text-generation
	---

	# Hinglish TTS 3B Model

	This is a fine-tuned version of (https://huggingface.co/canopylabs/3b-hi-pretrain-research_release) specialized for Hinglish (Hindi-English mixed) text-to-speech generation.

	## Model Details

	- Base Model: canopylabs/3b-hi-pretrain-research_release
	- Fine-tuning Method: LoRA with Unsloth (merged)
	- Languages: Hindi, English, Hinglish
	- Task: Text-to-Speech via audio token generation
	- Model Size: ~3B parameters

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load model and tokenizer
	model_name = "Indus-Labs/indus_tts_v3_snor"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Generate text
	prompt = "Hello doston, main aapka dost hun"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=1200)
	```

	## Fine-tuning Details

	- LoRA Rank: 64
	- LoRA Alpha: 64
	- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Training Framework: Unsloth

	## Audio-Language-Source

	bengali: alivia
	bhojpuri: kaajal
	kannada: aahna
	chattisgarahi: kaashvi
	hindi: aditi
	telugu: prerna
	marathi: saakshi
	mathili: manisha
	bengali_male: sayan
	bhojpuri_male: pawan
	hindi_male: arjun
	telgu_male: surya
	kannada_male: chinmay
	marathi_male: anant
	hindi_savi
	hindi_devi
	hinglish_savi
	hinglish_devi
	english_savi
	english_devi

	## Audio Generation

	This model generates audio tokens that need to be decoded using a SNAC (Scalable Neural Audio Codec) model:

	```python
	from snac import SNAC

	# Load SNAC decoder
	snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")

	# Process generated tokens to audio codes and decode
	# (See full implementation in the original training code)
	```

	## Limitations

	- Requires SNAC model for audio generation
	- Optimized for Hinglish content
	- May not perform well on pure English or pure Hindi in some cases

	## Citation

	If you use this model, please cite the original base model:

	```bibtex
	@misc{canopylabs-3b-hi,
	title={3B Hindi Pretrained Model},
	author={Canopy Labs},
	year={2024},
	url={https://huggingface.co/snorbyte/snorTTS-Indic-v0}
	}
	```