|
|
--- |
|
|
license: llama3.2 |
|
|
base_model: canopylabs/3b-hi-pretrain-research_release |
|
|
tags: |
|
|
- text-to-speech |
|
|
- hindi |
|
|
- hinglish |
|
|
- audio-generation |
|
|
- fine-tuned |
|
|
- unsloth |
|
|
language: |
|
|
- hi |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Hinglish TTS 3B Model |
|
|
|
|
|
This is a fine-tuned version of [canopylabs/3b-hi-pretrain-research_release](https://huggingface.co/canopylabs/3b-hi-pretrain-research_release) specialized for Hinglish (Hindi-English mixed) text-to-speech generation. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: canopylabs/3b-hi-pretrain-research_release |
|
|
- **Fine-tuning Method**: LoRA with Unsloth (merged) |
|
|
- **Languages**: Hindi, English, Hinglish |
|
|
- **Task**: Text-to-Speech via audio token generation |
|
|
- **Model Size**: ~3B parameters |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "Indus-Labs/v1_saavi_devi" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Generate text |
|
|
prompt = "Hello doston, main aapka dost hun" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=1200) |
|
|
``` |
|
|
|
|
|
## Fine-tuning Details |
|
|
|
|
|
- **LoRA Rank**: 64 |
|
|
- **LoRA Alpha**: 64 |
|
|
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
|
- **Training Framework**: Unsloth |
|
|
|
|
|
## Audio Generation |
|
|
|
|
|
This model generates audio tokens that need to be decoded using a SNAC (Scalable Neural Audio Codec) model: |
|
|
|
|
|
```python |
|
|
from snac import SNAC |
|
|
|
|
|
# Load SNAC decoder |
|
|
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz") |
|
|
|
|
|
# Process generated tokens to audio codes and decode |
|
|
# (See full implementation in the original training code) |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Requires SNAC model for audio generation |
|
|
- Optimized for Hinglish content |
|
|
- May not perform well on pure English or pure Hindi in some cases |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original base model: |
|
|
|
|
|
```bibtex |
|
|
@misc{canopylabs-3b-hi, |
|
|
title={3B Hindi Pretrained Model}, |
|
|
author={Canopy Labs}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/canopylabs/3b-hi-pretrain-research_release} |
|
|
} |
|
|
``` |
|
|
|