Indus-Labs
/

v1_saavi_devi

Text Generation

audio-generation

Model card Files Files and versions

v1_saavi_devi / README.md

sachin2000keshav's picture

sachin2000keshav

Upload README.md with huggingface_hub

ead57cd verified 5 months ago

|

history blame contribute delete

2.2 kB

	---
	license: llama3.2
	base_model: canopylabs/3b-hi-pretrain-research_release
	tags:
	- text-to-speech
	- hindi
	- hinglish
	- audio-generation
	- fine-tuned
	- unsloth
	language:
	- hi
	- en
	pipeline_tag: text-generation
	---

	# Hinglish TTS 3B Model

	This is a fine-tuned version of [canopylabs/3b-hi-pretrain-research_release](https://huggingface.co/canopylabs/3b-hi-pretrain-research_release) specialized for Hinglish (Hindi-English mixed) text-to-speech generation.

	## Model Details

	- Base Model: canopylabs/3b-hi-pretrain-research_release
	- Fine-tuning Method: LoRA with Unsloth (merged)
	- Languages: Hindi, English, Hinglish
	- Task: Text-to-Speech via audio token generation
	- Model Size: ~3B parameters

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load model and tokenizer
	model_name = "Indus-Labs/v1_saavi_devi"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Generate text
	prompt = "Hello doston, main aapka dost hun"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=1200)
	```

	## Fine-tuning Details

	- LoRA Rank: 64
	- LoRA Alpha: 64
	- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Training Framework: Unsloth

	## Audio Generation

	This model generates audio tokens that need to be decoded using a SNAC (Scalable Neural Audio Codec) model:

	```python
	from snac import SNAC

	# Load SNAC decoder
	snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")

	# Process generated tokens to audio codes and decode
	# (See full implementation in the original training code)
	```

	## Limitations

	- Requires SNAC model for audio generation
	- Optimized for Hinglish content
	- May not perform well on pure English or pure Hindi in some cases

	## Citation

	If you use this model, please cite the original base model:

	```bibtex
	@misc{canopylabs-3b-hi,
	title={3B Hindi Pretrained Model},
	author={Canopy Labs},
	year={2024},
	url={https://huggingface.co/canopylabs/3b-hi-pretrain-research_release}
	}
	```