Add Hindi-English model, multi-model support, and example scripts

- Add Hindi-English multi-speaker TTS model (5 speakers)
- Add model registry in hub.py for selecting model variants
- Update from_pretrained() to accept model="hindi_english" or model="telugu"
- Add torch.hub entry points: chiluka, chiluka_telugu, chiluka_hindi_english
- Add example scripts for HuggingFace Hub, PyTorch Hub, and pip usage
- Add HuggingFace model card (MODEL_CARD.md)
- Update README with all models and loading methods
- Exclude large weights from PyPI package via MANIFEST.in

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (13) hide show

MANIFEST.in +17 -0
MODEL_CARD.md +217 -0
README.md +208 -77
chiluka/__init__.py +11 -7
chiluka/configs/config_hindi_english.yml +110 -0
chiluka/hub.py +104 -90
chiluka/inference.py +16 -11
examples/huggingface_example.py +70 -0
examples/pip_example.py +88 -0
examples/torchhub_example.py +70 -0
hubconf.py +47 -15
pyproject.toml +4 -4
setup.py +24 -3

MANIFEST.in ADDED Viewed

	@@ -0,0 +1,17 @@

+# Include config files
+include chiluka/configs/*.yml
+# Include pretrained config files (but NOT weights)
+include chiluka/pretrained/ASR/config.yml
+include chiluka/pretrained/PLBERT/config.yml
+# Exclude large model weights (these come from HuggingFace Hub)
+exclude chiluka/checkpoints/*.pth
+exclude chiluka/pretrained/ASR/*.pth
+exclude chiluka/pretrained/JDC/*.t7
+exclude chiluka/pretrained/PLBERT/*.t7
+# Exclude other unnecessary files
+global-exclude *.pyc
+global-exclude __pycache__
+global-exclude *.egg-info

MODEL_CARD.md ADDED Viewed

	@@ -0,0 +1,217 @@

+---
+language:
+  - en
+  - hi
+  - te
+license: mit
+library_name: chiluka
+pipeline_tag: text-to-speech
+tags:
+  - text-to-speech
+  - tts
+  - styletts2
+  - voice-cloning
+  - multi-language
+  - hindi
+  - english
+  - telugu
+  - multi-speaker
+  - style-transfer
+---
+# Chiluka TTS
+**Chiluka** (చిలుక - Telugu for "parrot") is a lightweight, self-contained Text-to-Speech inference package based on [StyleTTS2](https://github.com/yl4579/StyleTTS2).
+It supports **style transfer from reference audio** - give it a voice sample and it will speak in that style.
+## Available Models
+| Model | Name | Languages | Speakers | Description |
+|-------|------|-----------|----------|-------------|
+| **Hindi-English** (default) | `hindi_english` | Hindi, English | 5 | Multi-speaker Hindi + English TTS |
+| **Telugu** | `telugu` | Telugu, English | 1 | Single-speaker Telugu + English TTS |
+## Installation
+```bash
+pip install chiluka
+```
+Or from GitHub:
+```bash
+pip install git+https://github.com/PurviewVoiceBot/chiluka.git
+```
+**System dependency** (required for phonemization):
+```bash
+# Ubuntu/Debian
+sudo apt-get install espeak-ng
+# macOS
+brew install espeak-ng
+```
+## Quick Start
+```python
+from chiluka import Chiluka
+# Load model (weights download automatically on first use)
+tts = Chiluka.from_pretrained()
+# Synthesize speech
+wav = tts.synthesize(
+    text="Hello, this is Chiluka speaking!",
+    reference_audio="path/to/reference.wav",
+    language="en"
+)
+# Save output
+tts.save_wav(wav, "output.wav")
+```
+## Choose a Model
+```python
+from chiluka import Chiluka
+# Hindi + English (default)
+tts = Chiluka.from_pretrained(model="hindi_english")
+# Telugu + English
+tts = Chiluka.from_pretrained(model="telugu")
+```
+## Hindi Example
+```python
+tts = Chiluka.from_pretrained()
+wav = tts.synthesize(
+    text="नमस्ते, मैं चिलुका बोल रहा हूं",
+    reference_audio="reference.wav",
+    language="hi"
+)
+tts.save_wav(wav, "hindi_output.wav")
+```
+## Telugu Example
+```python
+tts = Chiluka.from_pretrained(model="telugu")
+wav = tts.synthesize(
+    text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
+    reference_audio="reference.wav",
+    language="te"
+)
+tts.save_wav(wav, "telugu_output.wav")
+```
+## PyTorch Hub
+```python
+import torch
+# Hindi-English (default)
+tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
+# Telugu
+tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
+wav = tts.synthesize("Hello!", "reference.wav", language="en")
+```
+## Synthesis Parameters
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `text` | required | Input text to synthesize |
+| `reference_audio` | required | Path to reference audio for voice style |
+| `language` | `"en"` | Language code (`en`, `hi`, `te`, etc.) |
+| `alpha` | `0.3` | Acoustic style mixing (0 = reference voice, 1 = predicted) |
+| `beta` | `0.7` | Prosodic style mixing (0 = reference prosody, 1 = predicted) |
+| `diffusion_steps` | `5` | More steps = better quality, slower inference |
+| `embedding_scale` | `1.0` | Classifier-free guidance strength |
+## How It Works
+Chiluka uses a StyleTTS2-based pipeline:
+1. **Text** is converted to phonemes using espeak-ng
+2. **PL-BERT** encodes text into contextual embeddings
+3. **Reference audio** is processed to extract a style vector
+4. **Diffusion model** samples a style conditioned on text
+5. **Prosody predictor** generates duration, pitch (F0), and energy
+6. **HiFi-GAN decoder** synthesizes the final waveform at 24kHz
+## Model Architecture
+- **Text Encoder**: Token embedding + CNN + BiLSTM
+- **Style Encoder**: Conv2D + Residual blocks (style_dim=128)
+- **Prosody Predictor**: LSTM-based with AdaIN normalization
+- **Diffusion Model**: Transformer-based denoiser with ADPM2 sampler
+- **Decoder**: HiFi-GAN vocoder (upsample rates: 10, 5, 3, 2)
+- **Pretrained sub-models**: PL-BERT (text), ASR (alignment), JDC (pitch)
+## File Structure
+```
+├── configs/
+│   ├── config_ft.yml                 # Telugu model config
+│   └── config_hindi_english.yml      # Hindi-English model config
+├── checkpoints/
+│   ├── epoch_2nd_00017.pth           # Telugu checkpoint (~2GB)
+│   └── epoch_2nd_00029.pth           # Hindi-English checkpoint (~2GB)
+├── pretrained/                       # Shared pretrained sub-models
+│   ├── ASR/                          # Text-to-mel alignment
+│   ├── JDC/                          # Pitch extraction (F0)
+│   └── PLBERT/                       # Text encoder
+├── models/                           # Model architecture code
+│   ├── core.py
+│   ├── hifigan.py
+│   └── diffusion/
+├── inference.py                      # Main API
+├── hub.py                            # HuggingFace Hub utilities
+└── text_utils.py                     # Phoneme tokenization
+```
+## Requirements
+- Python >= 3.8
+- PyTorch >= 1.13.0
+- CUDA recommended (works on CPU too)
+- espeak-ng system package
+## Limitations
+- Requires a reference audio file for style/voice transfer
+- Quality depends on the reference audio quality
+- Best results with 3-15 second reference clips
+- Hindi-English model trained on 5 speakers
+- Telugu model trained on 1 speaker
+## Citation
+Based on StyleTTS2:
+```bibtex
+@inproceedings{li2024styletts,
+  title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models},
+  author={Li, Yinghao Aaron and Han, Cong and Raber, Vinay S and Mesgarani, Nima},
+  booktitle={NeurIPS},
+  year={2024}
+}
+```
+## License
+MIT License
+## Links
+- **GitHub**: [PurviewVoiceBot/chiluka](https://github.com/PurviewVoiceBot/chiluka)
+- **PyPI**: [chiluka](https://pypi.org/project/chiluka/)

README.md CHANGED Viewed

@@ -1,59 +1,65 @@
-# Chiluka 🦜
 **Chiluka** (చిలుక - Telugu for "parrot") is a self-contained TTS (Text-to-Speech) inference package based on StyleTTS2.
 ## Features
-- 🚀 Simple, clean API for TTS synthesis
-- 📦 **Fully self-contained** - all models bundled in the package
-- 🎙️ Style transfer from reference audio
-- 🌍 Multi-language support via phonemizer
-- 🔧 No external dependencies on other repos
 ## Installation
-### From Source (Recommended)
 ```bash
-git clone https://github.com/yourusername/chiluka.git
-cd chiluka
-pip install -e .
 ```
-**Note:** This repo uses Git LFS for large model files. Make sure to install Git LFS first:
 ```bash
-# Ubuntu/Debian
-sudo apt-get install git-lfs
-git lfs install
-# macOS
-brew install git-lfs
-git lfs install
-# Then clone
-git lfs clone https://github.com/yourusername/chiluka.git
 ```
-### Install espeak-ng (Required for phonemization)
-**Ubuntu/Debian:**
 ```bash
 sudo apt-get install espeak-ng
-```
-**macOS:**
-```bash
 brew install espeak-ng
 ```
 ## Quick Start
 ```python
 from chiluka import Chiluka
-# Initialize - uses bundled models automatically!
-tts = Chiluka()
 # Synthesize speech
 wav = tts.synthesize(
@@ -66,61 +72,123 @@ wav = tts.synthesize(
 tts.save_wav(wav, "output.wav")
 ```
-### Telugu Example
 ```python
 from chiluka import Chiluka
-tts = Chiluka()
 wav = tts.synthesize(
-    text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
-    reference_audio="path/to/telugu_reference.wav",
-    language="te"  # Telugu
 )
-tts.save_wav(wav, "telugu_output.wav")
 ```
-## Package Structure
 ```
-chiluka/
-├── chiluka/
-│   ├── __init__.py
-│   ├── inference.py          # Main Chiluka API
-│   ├── text_utils.py
-│   ├── utils.py
-│   ├── configs/
-│   │   └── config_ft.yml     # Model configuration
-│   ├── checkpoints/
-│   │   └── *.pth             # Trained model checkpoint
-│   ├── pretrained/
-│   │   ├── ASR/              # Text aligner model
-│   │   ├── JDC/              # Pitch extractor model
-│   │   └── PLBERT/           # PL-BERT model
-│   └── models/
-│       ├── core.py
-│       ├── hifigan.py
-│       └── diffusion/
-├── examples/
-│   ├── basic_synthesis.py
-│   └── telugu_synthesis.py
-├── setup.py
-├── pyproject.toml
-└── README.md
 ```
 ## API Reference
-### Chiluka Class
 ```python
 tts = Chiluka(
-    config_path=None,      # Optional: custom config file
-    checkpoint_path=None,  # Optional: custom checkpoint
-    pretrained_dir=None,   # Optional: custom pretrained models
-    device=None            # Optional: 'cuda' or 'cpu'
 )
 ```
@@ -130,11 +198,11 @@ tts = Chiluka(
 wav = tts.synthesize(
     text="Hello world",           # Text to synthesize
     reference_audio="ref.wav",    # Reference audio for style
-    language="en",                # Language code ('en', 'te', 'hi', etc.)
     alpha=0.3,                    # Acoustic style mixing (0-1)
     beta=0.7,                     # Prosodic style mixing (0-1)
-    diffusion_steps=5,            # Diffusion sampling steps
-    embedding_scale=1.0,          # Classifier-free guidance scale
     sr=24000                      # Sample rate
 )
 ```
@@ -158,23 +226,51 @@ style = tts.compute_style("reference.wav", sr=24000)
 |-----------|---------|-------------|
 | `alpha` | 0.3 | Acoustic style mixing (0=reference only, 1=predicted only) |
 | `beta` | 0.7 | Prosodic style mixing (0=reference only, 1=predicted only) |
-| `diffusion_steps` | 5 | Number of diffusion sampling steps (more = better quality, slower) |
 | `embedding_scale` | 1.0 | Classifier-free guidance scale |
 ## Supported Languages
-Uses [phonemizer](https://github.com/bootphon/phonemizer) with espeak-ng. Common languages:
-| Language | Code |
-|----------|------|
-| English (US) | `en-us` |
-| English (UK) | `en-gb` |
-| Telugu | `te` |
-| Hindi | `hi` |
-| Tamil | `ta` |
-| Kannada | `kn` |
-See espeak-ng documentation for full list.
 ## Requirements
@@ -183,11 +279,46 @@ See espeak-ng documentation for full list.
 - CUDA (recommended for faster inference)
 - espeak-ng
 ## Training Your Own Model
 This package is for **inference only**. To train your own model, use the original [StyleTTS2](https://github.com/yl4579/StyleTTS2) repository.
-After training, copy your checkpoint to `chiluka/checkpoints/` and update the config if needed.
 ## Credits

+# Chiluka
 **Chiluka** (చిలుక - Telugu for "parrot") is a self-contained TTS (Text-to-Speech) inference package based on StyleTTS2.
 ## Features
+- Simple, clean API for TTS synthesis
+- Style transfer from reference audio
+- Multi-language support via phonemizer
+- **Multiple models** - Hindi-English and Telugu
+- **Multiple ways to load** - HuggingFace Hub, PyTorch Hub, pip install
+## Available Models
+| Model | Name | Languages | Speakers | Description |
+|-------|------|-----------|----------|-------------|
+| Hindi-English (default) | `hindi_english` | Hindi, English | 5 | Multi-speaker Hindi + English TTS |
+| Telugu | `telugu` | Telugu, English | 1 | Single-speaker Telugu + English TTS |
 ## Installation
+### Option 1: pip install
 ```bash
+pip install chiluka
 ```
+### Option 2: Install from GitHub
 ```bash
+pip install git+https://github.com/PurviewVoiceBot/chiluka.git
+```
+### Option 3: From Source
+```bash
+git clone https://github.com/PurviewVoiceBot/chiluka.git
+cd chiluka
+pip install -e .
 ```
+### System Dependency: espeak-ng (Required)
 ```bash
+# Ubuntu/Debian
 sudo apt-get install espeak-ng
+# macOS
 brew install espeak-ng
 ```
 ## Quick Start
+### HuggingFace Hub (Recommended)
+Model weights download automatically on first use. No cloning needed.
 ```python
 from chiluka import Chiluka
+# Load Hindi-English model (default)
+tts = Chiluka.from_pretrained()
 # Synthesize speech
 wav = tts.synthesize(
 tts.save_wav(wav, "output.wav")
 ```
+### Load a Specific Model
 ```python
 from chiluka import Chiluka
+# Hindi-English (default)
+tts = Chiluka.from_pretrained(model="hindi_english")
+# Telugu
+tts = Chiluka.from_pretrained(model="telugu")
+```
+### PyTorch Hub
+```python
+import torch
+# Hindi-English (default)
+tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
+# Telugu
+tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
+# Synthesize
 wav = tts.synthesize(
+    text="Hello from PyTorch Hub!",
+    reference_audio="reference.wav",
+    language="en"
 )
+```
+### Local Weights (if you cloned with Git LFS)
+```python
+from chiluka import Chiluka
+tts = Chiluka()  # uses bundled weights from cloned repo
 ```
+## Examples
+### Hindi Synthesis
+```python
+from chiluka import Chiluka
+tts = Chiluka.from_pretrained(model="hindi_english")
+wav = tts.synthesize(
+    text="नमस्ते, मैं चिलुका बोल रहा हूं",
+    reference_audio="hindi_reference.wav",
+    language="hi"
+)
+tts.save_wav(wav, "hindi_output.wav")
 ```
+### English Synthesis
+```python
+wav = tts.synthesize(
+    text="Hello, I am Chiluka, a text to speech system.",
+    reference_audio="english_reference.wav",
+    language="en"
+)
+tts.save_wav(wav, "english_output.wav")
+```
+### Telugu Synthesis
+```python
+from chiluka import Chiluka
+tts = Chiluka.from_pretrained(model="telugu")
+wav = tts.synthesize(
+    text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
+    reference_audio="telugu_reference.wav",
+    language="te"
+)
+tts.save_wav(wav, "telugu_output.wav")
+```
+### List Available Models
+```python
+from chiluka import list_models
+models = list_models()
+for name, info in models.items():
+    print(f"{name}: {info['description']} ({', '.join(info['languages'])})")
 ```
 ## API Reference
+### Loading the Model
 ```python
+# Auto-download from HuggingFace (recommended)
+tts = Chiluka.from_pretrained()                              # Hindi-English (default)
+tts = Chiluka.from_pretrained(model="telugu")                # Telugu
+tts = Chiluka.from_pretrained(model="hindi_english")         # Hindi-English (explicit)
+# With options
+tts = Chiluka.from_pretrained(
+    model="hindi_english",          # Model variant
+    repo_id="Seemanth/chiluka-tts", # HuggingFace repo
+    device="cuda",                  # or "cpu"
+    force_download=False,           # Re-download even if cached
+    token="hf_xxx"                  # For private repos
+)
+# Local weights
 tts = Chiluka(
+    config_path="path/to/config.yml",
+    checkpoint_path="path/to/model.pth",
+    pretrained_dir="path/to/pretrained/",
+    device="cuda"
 )
 ```
 wav = tts.synthesize(
     text="Hello world",           # Text to synthesize
     reference_audio="ref.wav",    # Reference audio for style
+    language="en",                # Language code
     alpha=0.3,                    # Acoustic style mixing (0-1)
     beta=0.7,                     # Prosodic style mixing (0-1)
+    diffusion_steps=5,            # Quality vs speed tradeoff
+    embedding_scale=1.0,          # Classifier-free guidance
     sr=24000                      # Sample rate
 )
 ```
 |-----------|---------|-------------|
 | `alpha` | 0.3 | Acoustic style mixing (0=reference only, 1=predicted only) |
 | `beta` | 0.7 | Prosodic style mixing (0=reference only, 1=predicted only) |
+| `diffusion_steps` | 5 | Diffusion sampling steps (more = better quality, slower) |
 | `embedding_scale` | 1.0 | Classifier-free guidance scale |
 ## Supported Languages
+Uses [phonemizer](https://github.com/bootphon/phonemizer) with espeak-ng:
+| Language | Code | Available In |
+|----------|------|-------------|
+| English (US) | `en-us` | All models |
+| English (UK) | `en-gb` | All models |
+| Hindi | `hi` | `hindi_english` |
+| Telugu | `te` | `telugu` |
+| Tamil | `ta` | With fine-tuning |
+| Kannada | `kn` | With fine-tuning |
+## Hub Utilities
+```python
+from chiluka import list_models, clear_cache, push_to_hub, get_cache_dir
+# List available models
+list_models()
+# Clear cache
+clear_cache()                           # Clear all
+clear_cache("Seemanth/chiluka-tts")     # Clear specific repo
+# Push your own model to HuggingFace
+push_to_hub(
+    local_dir="./my-model",
+    repo_id="myusername/my-chiluka-model",
+    token="hf_your_token"
+)
+# Check cache location
+print(get_cache_dir())  # ~/.cache/chiluka
+```
+## Environment Variables
+| Variable | Description |
+|----------|-------------|
+| `CHILUKA_CACHE` | Custom cache directory (default: `~/.cache/chiluka`) |
+| `HF_TOKEN` | HuggingFace API token for private repos |
 ## Requirements
 - CUDA (recommended for faster inference)
 - espeak-ng
+## Package Structure
+```
+chiluka/
+├── chiluka/
+│   ├── __init__.py
+│   ├── inference.py              # Main Chiluka API
+│   ├── hub.py                    # Hub download + model registry
+│   ├── text_utils.py
+│   ├── utils.py
+│   ├── configs/
+│   │   ├── config_ft.yml         # Telugu model config
+│   │   └── config_hindi_english.yml  # Hindi-English model config
+│   ├── checkpoints/
+│   │   ├── epoch_2nd_00017.pth   # Telugu checkpoint
+│   │   └── epoch_2nd_00029.pth   # Hindi-English checkpoint
+│   ├── pretrained/               # Shared pretrained sub-models
+│   │   ├── ASR/
+│   │   ├── JDC/
+│   │   └── PLBERT/
+│   └── models/
+├── hubconf.py                    # PyTorch Hub config
+├── examples/
+│   ├── basic_synthesis.py
+│   ├── telugu_synthesis.py
+│   ├── huggingface_example.py
+│   ├── torchhub_example.py
+│   └── pip_example.py
+├── setup.py
+└── README.md
+```
 ## Training Your Own Model
 This package is for **inference only**. To train your own model, use the original [StyleTTS2](https://github.com/yl4579/StyleTTS2) repository.
+After training:
+1. Copy your checkpoint and config to a directory
+2. Push to HuggingFace Hub using `push_to_hub()`
+3. Load with `Chiluka.from_pretrained("your-repo")`
 ## Credits

chiluka/__init__.py CHANGED Viewed

@@ -1,17 +1,17 @@
 """
 Chiluka - A lightweight TTS inference package based on StyleTTS2
-Usage:
-    # Local weights (if you have them)
-    from chiluka import Chiluka
-    tts = Chiluka()
-    # Auto-download from HuggingFace Hub (recommended)
     from chiluka import Chiluka
     tts = Chiluka.from_pretrained()
-    # From specific HuggingFace repo
-    tts = Chiluka.from_pretrained("username/model-name")
     # Generate speech
     wav = tts.synthesize(
@@ -31,7 +31,9 @@ from .hub import (
     clear_cache,
     get_cache_dir,
     create_model_card,
     DEFAULT_HF_REPO,
 )
 __all__ = [
@@ -41,5 +43,7 @@ __all__ = [
     "clear_cache",
     "get_cache_dir",
     "create_model_card",
     "DEFAULT_HF_REPO",
 ]

 """
 Chiluka - A lightweight TTS inference package based on StyleTTS2
+Available models:
+    - 'hindi_english' (default) - Hindi + English multi-speaker TTS
+    - 'telugu' - Telugu + English single-speaker TTS
+Usage:
+    # Hindi-English model (default, auto-downloads from HuggingFace)
     from chiluka import Chiluka
     tts = Chiluka.from_pretrained()
+    # Telugu model
+    tts = Chiluka.from_pretrained(model="telugu")
     # Generate speech
     wav = tts.synthesize(
     clear_cache,
     get_cache_dir,
     create_model_card,
+    list_models,
     DEFAULT_HF_REPO,
+    MODEL_REGISTRY,
 )
 __all__ = [
     "clear_cache",
     "get_cache_dir",
     "create_model_card",
+    "list_models",
     "DEFAULT_HF_REPO",
+    "MODEL_REGISTRY",
 ]

chiluka/configs/config_hindi_english.yml ADDED Viewed

	@@ -0,0 +1,110 @@

+log_dir: "Models/hindi_english_multispeaker_finetuned"
+first_stage_path: "first_stage.pth"
+save_freq: 1
+log_interval: 10
+device: "cuda"
+epochs_1st: 15
+epochs_2nd: 15
+batch_size: 2
+max_len: 200
+pretrained_model: ""
+second_stage_load_pretrained: true
+load_only_params: true
+F0_path: "Utils/JDC/bst.t7"
+ASR_config: "Utils/ASR/config.yml"
+ASR_path: "Utils/ASR/epoch_00080.pth"
+PLBERT_dir: "Utils/PLBERT/"
+data_params:
+  train_data: ""
+  val_data: ""
+  root_path: ""
+  OOD_data: ""
+  min_length: 50
+# Audio preprocessing (24kHz)
+preprocess_params:
+  sr: 24000
+  spect_params:
+    n_fft: 2048
+    win_length: 1200
+    hop_length: 300
+# Model architecture
+model_params:
+  multispeaker: true
+  num_speakers: 5
+  dim_in: 64
+  hidden_dim: 512
+  max_conv_dim: 512
+  n_layer: 3
+  n_mels: 80
+  n_token: 178
+  max_dur: 50
+  style_dim: 128
+  dropout: 0.2
+  speaker_embed_dim: 256
+  decoder:
+    type: "hifigan"
+    resblock_dilation_sizes: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
+    resblock_kernel_sizes: [3, 7, 11]
+    upsample_initial_channel: 512
+    upsample_rates: [10, 5, 3, 2]
+    upsample_kernel_sizes: [20, 10, 6, 4]
+  slm:
+    model: "microsoft/wavlm-base-plus"
+    sr: 16000
+    hidden: 768
+    nlayers: 13
+    initial_channel: 64
+  diffusion:
+    embedding_mask_proba: 0.1
+    transformer:
+      num_layers: 3
+      num_heads: 8
+      head_features: 64
+      multiplier: 2
+    dist:
+      sigma_data: 0.19926648961191362
+      estimate_sigma_data: true
+      mean: -3.0
+      std: 1.0
+loss_params:
+  lambda_mel: 5.0
+  lambda_gen: 1.0
+  lambda_slm: 1.0
+  lambda_mono: 1.0
+  lambda_s2s: 1.0
+  lambda_F0: 1.0
+  lambda_norm: 1.0
+  lambda_dur: 1.0
+  lambda_ce: 20.0
+  lambda_sty: 1.0
+  lambda_diff: 1.0
+  TMA_epoch: 2
+  diff_epoch: 0
+  joint_epoch: 0
+optimizer_params:
+  lr: 0.00005
+  bert_lr: 0.000005
+  ft_lr: 0.000005
+slmadv_params:
+  min_len: 400
+  max_len: 500
+  batch_percentage: 0.5
+  iter: 20
+  thresh: 5
+  scale: 0.01
+  sig: 1.5

chiluka/hub.py CHANGED Viewed

@@ -5,6 +5,7 @@ Supports:
 - HuggingFace Hub integration
 - Automatic model downloading
 - Local caching
 """
 import os
@@ -13,15 +14,35 @@ from pathlib import Path
 from typing import Optional, Union
 # Default HuggingFace Hub repository
-DEFAULT_HF_REPO = "yourusername/chiluka-tts"  # TODO: Update with your actual repo
 # Cache directory for downloaded models
 CACHE_DIR = Path.home() / ".cache" / "chiluka"
-# Required model files
-REQUIRED_FILES = {
-    "checkpoint": "checkpoints/epoch_2nd_00017.pth",
-    "config": "configs/config_ft.yml",
     "asr_config": "pretrained/ASR/config.yml",
     "asr_model": "pretrained/ASR/epoch_00080.pth",
     "f0_model": "pretrained/JDC/bst.t7",
@@ -30,6 +51,27 @@ REQUIRED_FILES = {
 }
 def get_cache_dir() -> Path:
     """Get the cache directory for Chiluka models."""
     cache_dir = Path(os.environ.get("CHILUKA_CACHE", CACHE_DIR))
@@ -43,11 +85,19 @@ def is_model_cached(repo_id: str = DEFAULT_HF_REPO) -> bool:
     if not cache_path.exists():
         return False
-    # Check if all required files exist
-    for file_path in REQUIRED_FILES.values():
         if not (cache_path / file_path).exists():
             return False
-    return True
 def download_from_hf(
@@ -60,21 +110,16 @@ def download_from_hf(
     Download model files from HuggingFace Hub.
     Args:
-        repo_id: HuggingFace Hub repository ID (e.g., 'username/model-name')
         revision: Git revision to download (branch, tag, or commit hash)
         force_download: If True, re-download even if cached
         token: HuggingFace API token for private repos
     Returns:
         Path to the downloaded model directory
-    Example:
-        >>> model_path = download_from_hf("yourusername/chiluka-tts")
-        >>> print(model_path)
-        /home/user/.cache/chiluka/yourusername_chiluka-tts
     """
     try:
-        from huggingface_hub import snapshot_download, hf_hub_download
     except ImportError:
         raise ImportError(
             "huggingface_hub is required for downloading models. "
@@ -89,7 +134,6 @@ def download_from_hf(
     print(f"Downloading model from HuggingFace Hub: {repo_id}...")
-    # Download entire repository
     downloaded_path = snapshot_download(
         repo_id=repo_id,
         revision=revision,
@@ -103,60 +147,32 @@ def download_from_hf(
     return Path(downloaded_path)
-def download_from_url(
-    url: str,
-    filename: str,
-    force_download: bool = False,
-) -> Path:
-    """
-    Download a single file from a URL.
-    Args:
-        url: URL to download from
-        filename: Local filename to save as
-        force_download: If True, re-download even if exists
-    Returns:
-        Path to the downloaded file
-    """
-    import urllib.request
-    cache_dir = get_cache_dir() / "downloads"
-    cache_dir.mkdir(parents=True, exist_ok=True)
-    local_path = cache_dir / filename
-    if local_path.exists() and not force_download:
-        print(f"Using cached file: {local_path}")
-        return local_path
-    print(f"Downloading {filename}...")
-    # Download with progress
-    def _progress_hook(count, block_size, total_size):
-        percent = int(count * block_size * 100 / total_size)
-        print(f"\rDownloading: {percent}%", end="", flush=True)
-    urllib.request.urlretrieve(url, local_path, reporthook=_progress_hook)
-    print()  # New line after progress
-    return local_path
-def get_model_paths(repo_id: str = DEFAULT_HF_REPO) -> dict:
     """
     Get paths to all model files after downloading.
     Args:
         repo_id: HuggingFace Hub repository ID
     Returns:
         Dictionary with paths to config, checkpoint, and pretrained directory
     """
     model_dir = download_from_hf(repo_id)
     return {
-        "config_path": str(model_dir / "configs" / "config_ft.yml"),
-        "checkpoint_path": str(model_dir / "checkpoints" / "epoch_2nd_00017.pth"),
         "pretrained_dir": str(model_dir / "pretrained"),
     }
@@ -202,7 +218,7 @@ def push_to_hub(
     Example:
         >>> push_to_hub(
         ...     local_dir="./chiluka",
-        ...     repo_id="myusername/my-chiluka-model",
         ...     private=False
         ... )
     """
@@ -245,6 +261,14 @@ def create_model_card(repo_id: str, save_path: Optional[str] = None) -> str:
     Returns:
         Model card content as string
     """
     model_card = f"""---
 language:
   - en
@@ -257,12 +281,19 @@ tags:
   - tts
   - styletts2
   - voice-cloning
 ---
 # Chiluka TTS
 Chiluka (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on StyleTTS2.
 ## Installation
 ```bash
@@ -272,64 +303,47 @@ pip install chiluka
 Or install from source:
 ```bash
-pip install git+https://github.com/{repo_id.split('/')[0]}/chiluka.git
 ```
 ## Usage
-### Quick Start (Auto-download)
 ```python
 from chiluka import Chiluka
-# Automatically downloads model weights
 tts = Chiluka.from_pretrained()
-# Generate speech
 wav = tts.synthesize(
     text="Hello, world!",
-    reference_audio="path/to/reference.wav",
     language="en"
 )
-# Save output
 tts.save_wav(wav, "output.wav")
 ```
-### PyTorch Hub
 ```python
-import torch
-tts = torch.hub.load('{repo_id.split('/')[0]}/chiluka', 'chiluka')
-wav = tts.synthesize("Hello!", "reference.wav", language="en")
 ```
-### HuggingFace Hub
 ```python
-from chiluka import Chiluka
-tts = Chiluka.from_pretrained("{repo_id}")
 ```
-## Parameters
-- `text`: Input text to synthesize
-- `reference_audio`: Path to reference audio for style transfer
-- `language`: Language code ('en', 'te', 'hi', etc.)
-- `alpha`: Acoustic style mixing (0-1, default 0.3)
-- `beta`: Prosodic style mixing (0-1, default 0.7)
-- `diffusion_steps`: Quality vs speed tradeoff (default 5)
-## Supported Languages
-Uses espeak-ng phonemizer. Common languages:
-- English: `en-us`, `en-gb`
-- Telugu: `te`
-- Hindi: `hi`
-- Tamil: `ta`
 ## License
 MIT License

 - HuggingFace Hub integration
 - Automatic model downloading
 - Local caching
+- Multiple model variants
 """
 import os
 from typing import Optional, Union
 # Default HuggingFace Hub repository
+DEFAULT_HF_REPO = "Seemanth/chiluka-tts"
 # Cache directory for downloaded models
 CACHE_DIR = Path.home() / ".cache" / "chiluka"
+# ============================================
+# Model Registry
+# ============================================
+# Maps model names to their config + checkpoint paths
+# relative to the repo root.
+MODEL_REGISTRY = {
+    "telugu": {
+        "config": "configs/config_ft.yml",
+        "checkpoint": "checkpoints/epoch_2nd_00017.pth",
+        "languages": ["te", "en"],
+        "description": "Telugu + English single-speaker TTS",
+    },
+    "hindi_english": {
+        "config": "configs/config_hindi_english.yml",
+        "checkpoint": "checkpoints/epoch_2nd_00029.pth",
+        "languages": ["hi", "en"],
+        "description": "Hindi + English multi-speaker TTS (5 speakers)",
+    },
+}
+DEFAULT_MODEL = "hindi_english"
+# Shared pretrained sub-models (same across all variants)
+PRETRAINED_FILES = {
     "asr_config": "pretrained/ASR/config.yml",
     "asr_model": "pretrained/ASR/epoch_00080.pth",
     "f0_model": "pretrained/JDC/bst.t7",
 }
+def list_models() -> dict:
+    """
+    List all available model variants.
+    Returns:
+        Dictionary of model names and their info.
+    Example:
+        >>> from chiluka import hub
+        >>> hub.list_models()
+        {'telugu': {...}, 'hindi_english': {...}}
+    """
+    return {
+        name: {
+            "languages": info["languages"],
+            "description": info["description"],
+        }
+        for name, info in MODEL_REGISTRY.items()
+    }
 def get_cache_dir() -> Path:
     """Get the cache directory for Chiluka models."""
     cache_dir = Path(os.environ.get("CHILUKA_CACHE", CACHE_DIR))
     if not cache_path.exists():
         return False
+    # Check if shared pretrained files exist
+    for file_path in PRETRAINED_FILES.values():
         if not (cache_path / file_path).exists():
             return False
+    # Check if at least one model variant exists
+    for model_info in MODEL_REGISTRY.values():
+        config_exists = (cache_path / model_info["config"]).exists()
+        checkpoint_exists = (cache_path / model_info["checkpoint"]).exists()
+        if config_exists and checkpoint_exists:
+            return True
+    return False
 def download_from_hf(
     Download model files from HuggingFace Hub.
     Args:
+        repo_id: HuggingFace Hub repository ID (e.g., 'Seemanth/chiluka-tts')
         revision: Git revision to download (branch, tag, or commit hash)
         force_download: If True, re-download even if cached
         token: HuggingFace API token for private repos
     Returns:
         Path to the downloaded model directory
     """
     try:
+        from huggingface_hub import snapshot_download
     except ImportError:
         raise ImportError(
             "huggingface_hub is required for downloading models. "
     print(f"Downloading model from HuggingFace Hub: {repo_id}...")
     downloaded_path = snapshot_download(
         repo_id=repo_id,
         revision=revision,
     return Path(downloaded_path)
+def get_model_paths(
+    model: str = DEFAULT_MODEL,
+    repo_id: str = DEFAULT_HF_REPO,
+) -> dict:
     """
     Get paths to all model files after downloading.
     Args:
+        model: Model variant name ('telugu', 'hindi_english')
         repo_id: HuggingFace Hub repository ID
     Returns:
         Dictionary with paths to config, checkpoint, and pretrained directory
     """
+    if model not in MODEL_REGISTRY:
+        available = ", ".join(MODEL_REGISTRY.keys())
+        raise ValueError(
+            f"Unknown model '{model}'. Available models: {available}"
+        )
     model_dir = download_from_hf(repo_id)
+    model_info = MODEL_REGISTRY[model]
     return {
+        "config_path": str(model_dir / model_info["config"]),
+        "checkpoint_path": str(model_dir / model_info["checkpoint"]),
         "pretrained_dir": str(model_dir / "pretrained"),
     }
     Example:
         >>> push_to_hub(
         ...     local_dir="./chiluka",
+        ...     repo_id="Seemanth/chiluka-tts",
         ...     private=False
         ... )
     """
     Returns:
         Model card content as string
     """
+    owner = repo_id.split("/")[0]
+    # Build model table
+    model_rows = ""
+    for name, info in MODEL_REGISTRY.items():
+        langs = ", ".join(info["languages"])
+        model_rows += f"| `{name}` | {info['description']} | {langs} |\n"
     model_card = f"""---
 language:
   - en
   - tts
   - styletts2
   - voice-cloning
+  - multi-language
 ---
 # Chiluka TTS
 Chiluka (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on StyleTTS2.
+## Available Models
+| Model | Description | Languages |
+|-------|-------------|-----------|
+{model_rows}
 ## Installation
 ```bash
 Or install from source:
 ```bash
+pip install git+https://github.com/{owner}/chiluka.git
 ```
 ## Usage
+### Hindi + English (default)
 ```python
 from chiluka import Chiluka
 tts = Chiluka.from_pretrained()
 wav = tts.synthesize(
     text="Hello, world!",
+    reference_audio="reference.wav",
     language="en"
 )
 tts.save_wav(wav, "output.wav")
 ```
+### Telugu
 ```python
+tts = Chiluka.from_pretrained(model="telugu")
+wav = tts.synthesize(
+    text="నమస్కారం",
+    reference_audio="reference.wav",
+    language="te"
+)
 ```
+### PyTorch Hub
 ```python
+import torch
+tts = torch.hub.load('{owner}/chiluka', 'chiluka')
+tts = torch.hub.load('{owner}/chiluka', 'chiluka_telugu')
 ```
 ## License
 MIT License

chiluka/inference.py CHANGED Viewed

@@ -155,6 +155,7 @@ class Chiluka:
     @classmethod
     def from_pretrained(
         cls,
         repo_id: str = None,
         device: Optional[str] = None,
         force_download: bool = False,
@@ -168,7 +169,10 @@ class Chiluka:
         Weights are automatically downloaded and cached on first use.
         Args:
-            repo_id: HuggingFace Hub repository ID (e.g., 'username/chiluka-tts').
                     If None, uses the default repository.
             device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
             force_download: If True, re-download even if cached.
@@ -179,31 +183,32 @@ class Chiluka:
             Initialized Chiluka TTS model ready for inference.
         Examples:
-            # Default repository (auto-download)
             >>> tts = Chiluka.from_pretrained()
-            # Specific repository
-            >>> tts = Chiluka.from_pretrained("myuser/my-chiluka-model")
             # Force re-download
             >>> tts = Chiluka.from_pretrained(force_download=True)
-            # Private repository
-            >>> tts = Chiluka.from_pretrained("myuser/private-model", token="hf_xxx")
         """
-        from .hub import download_from_hf, get_model_paths, DEFAULT_HF_REPO
         repo_id = repo_id or DEFAULT_HF_REPO
         # Download model files (or use cache)
-        model_dir = download_from_hf(
             repo_id=repo_id,
             force_download=force_download,
             token=token,
         )
-        # Get paths to model files
-        paths = get_model_paths(repo_id)
         return cls(
             config_path=paths["config_path"],

     @classmethod
     def from_pretrained(
         cls,
+        model: str = None,
         repo_id: str = None,
         device: Optional[str] = None,
         force_download: bool = False,
         Weights are automatically downloaded and cached on first use.
         Args:
+            model: Model variant to load. Options:
+                - 'hindi_english' (default) - Hindi + English multi-speaker TTS
+                - 'telugu' - Telugu + English single-speaker TTS
+            repo_id: HuggingFace Hub repository ID (e.g., 'Seemanth/chiluka-tts').
                     If None, uses the default repository.
             device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
             force_download: If True, re-download even if cached.
             Initialized Chiluka TTS model ready for inference.
         Examples:
+            # Hindi-English model (default)
             >>> tts = Chiluka.from_pretrained()
+            # Telugu model
+            >>> tts = Chiluka.from_pretrained(model="telugu")
+            # Specific HuggingFace repository
+            >>> tts = Chiluka.from_pretrained(repo_id="myuser/my-model")
             # Force re-download
             >>> tts = Chiluka.from_pretrained(force_download=True)
         """
+        from .hub import download_from_hf, get_model_paths, DEFAULT_HF_REPO, DEFAULT_MODEL
+        model = model or DEFAULT_MODEL
         repo_id = repo_id or DEFAULT_HF_REPO
         # Download model files (or use cache)
+        download_from_hf(
             repo_id=repo_id,
             force_download=force_download,
             token=token,
         )
+        # Get paths to model files for the selected variant
+        paths = get_model_paths(model=model, repo_id=repo_id)
         return cls(
             config_path=paths["config_path"],

examples/huggingface_example.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""
+Chiluka TTS - HuggingFace Hub Example
+Load model weights directly from HuggingFace Hub.
+No need to clone the repository or download weights manually.
+Requirements:
+    pip install chiluka
+    sudo apt-get install espeak-ng
+Usage:
+    python huggingface_example.py --reference path/to/reference.wav
+    python huggingface_example.py --reference ref.wav --model telugu --language te --text "నమస్కారం"
+"""
+import argparse
+from chiluka import Chiluka, list_models
+def main():
+    parser = argparse.ArgumentParser(description="Chiluka TTS - HuggingFace Hub Example")
+    parser.add_argument("--reference", type=str, required=True, help="Path to reference audio file")
+    parser.add_argument("--model", type=str, default="hindi_english", choices=["hindi_english", "telugu"],
+                        help="Model variant to use (default: hindi_english)")
+    parser.add_argument("--text", type=str, default=None, help="Text to synthesize")
+    parser.add_argument("--language", type=str, default=None, help="Language code (en, hi, te)")
+    parser.add_argument("--output", type=str, default="output_hf.wav", help="Output wav file path")
+    parser.add_argument("--device", type=str, default=None, help="Device: cuda or cpu")
+    args = parser.parse_args()
+    # Show available models
+    print("Available models:")
+    for name, info in list_models().items():
+        marker = " <--" if name == args.model else ""
+        print(f"  {name}: {info['description']}{marker}")
+    print()
+    # Set defaults based on model choice
+    if args.text is None:
+        if args.model == "telugu":
+            args.text = "నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను"
+        else:
+            args.text = "Hello, I am Chiluka, a text to speech system."
+    if args.language is None:
+        if args.model == "telugu":
+            args.language = "te"
+        else:
+            args.language = "en"
+    # Load model from HuggingFace Hub (auto-downloads on first use)
+    print(f"Loading '{args.model}' model from HuggingFace Hub...")
+    tts = Chiluka.from_pretrained(model=args.model, device=args.device)
+    # Synthesize
+    print(f"Synthesizing: '{args.text}'")
+    print(f"Language: {args.language}")
+    wav = tts.synthesize(
+        text=args.text,
+        reference_audio=args.reference,
+        language=args.language,
+    )
+    # Save
+    tts.save_wav(wav, args.output)
+    print(f"Duration: {len(wav) / 24000:.2f} seconds")
+if __name__ == "__main__":
+    main()

examples/pip_example.py ADDED Viewed

	@@ -0,0 +1,88 @@

+"""
+Chiluka TTS - pip install Example
+After installing via pip, model weights auto-download from HuggingFace
+on first use and are cached locally.
+Install:
+    pip install chiluka
+    sudo apt-get install espeak-ng
+Usage:
+    python pip_example.py --reference path/to/reference.wav
+    python pip_example.py --reference ref.wav --model telugu --language te
+"""
+import argparse
+def main():
+    parser = argparse.ArgumentParser(description="Chiluka TTS - pip Example")
+    parser.add_argument("--reference", type=str, required=True, help="Path to reference audio file")
+    parser.add_argument("--model", type=str, default="hindi_english", choices=["hindi_english", "telugu"],
+                        help="Model variant (default: hindi_english)")
+    parser.add_argument("--text", type=str, default=None, help="Text to synthesize")
+    parser.add_argument("--language", type=str, default=None, help="Language code (en, hi, te)")
+    parser.add_argument("--output", type=str, default="output_pip.wav", help="Output wav file path")
+    args = parser.parse_args()
+    # Import after argparse so --help is fast
+    from chiluka import Chiluka, list_models
+    # Set defaults
+    if args.text is None:
+        texts = {
+            "hindi_english": "Hello, I am Chiluka, a text to speech system.",
+            "telugu": "నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
+        }
+        args.text = texts[args.model]
+    if args.language is None:
+        langs = {"hindi_english": "en", "telugu": "te"}
+        args.language = langs[args.model]
+    # List models
+    print("Available models:")
+    for name, info in list_models().items():
+        print(f"  {name}: {info['description']}")
+    print()
+    # Load model (auto-downloads weights on first run)
+    print(f"Loading '{args.model}' model...")
+    tts = Chiluka.from_pretrained(model=args.model)
+    # Synthesize speech
+    print(f"Text: '{args.text}'")
+    print(f"Language: {args.language}")
+    print(f"Reference: {args.reference}")
+    print()
+    wav = tts.synthesize(
+        text=args.text,
+        reference_audio=args.reference,
+        language=args.language,
+        alpha=0.3,
+        beta=0.7,
+        diffusion_steps=5,
+        embedding_scale=1.0,
+    )
+    # Save output
+    tts.save_wav(wav, args.output)
+    print(f"Duration: {len(wav) / 24000:.2f} seconds")
+    # --- Bonus: synthesize in another language with same model ---
+    if args.model == "hindi_english":
+        print("\n--- Bonus: Hindi synthesis with same model ---")
+        hindi_wav = tts.synthesize(
+            text="नमस्ते, मैं चिलुका बोल रहा हूं",
+            reference_audio=args.reference,
+            language="hi",
+        )
+        hindi_output = args.output.replace(".wav", "_hindi.wav")
+        tts.save_wav(hindi_wav, hindi_output)
+        print(f"Duration: {len(hindi_wav) / 24000:.2f} seconds")
+if __name__ == "__main__":
+    main()

examples/torchhub_example.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""
+Chiluka TTS - PyTorch Hub Example
+Load the model using torch.hub.load() - no pip install needed,
+just PyTorch and a GitHub repo.
+Requirements:
+    pip install torch torchaudio
+    sudo apt-get install espeak-ng
+Usage:
+    python torchhub_example.py --reference path/to/reference.wav
+    python torchhub_example.py --reference ref.wav --variant telugu --language te
+"""
+import argparse
+import torch
+def main():
+    parser = argparse.ArgumentParser(description="Chiluka TTS - PyTorch Hub Example")
+    parser.add_argument("--reference", type=str, required=True, help="Path to reference audio file")
+    parser.add_argument("--variant", type=str, default="default", choices=["default", "telugu", "hindi_english"],
+                        help="Model variant (default, telugu, hindi_english)")
+    parser.add_argument("--text", type=str, default=None, help="Text to synthesize")
+    parser.add_argument("--language", type=str, default=None, help="Language code (en, hi, te)")
+    parser.add_argument("--output", type=str, default="output_torchhub.wav", help="Output wav file path")
+    args = parser.parse_args()
+    # Set defaults
+    if args.text is None:
+        if args.variant == "telugu":
+            args.text = "నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను"
+        else:
+            args.text = "Hello, I am Chiluka, a text to speech system."
+    if args.language is None:
+        if args.variant == "telugu":
+            args.language = "te"
+        else:
+            args.language = "en"
+    # Load via torch.hub
+    # Available entry points:
+    #   'chiluka'              - Hindi-English model (default)
+    #   'chiluka_telugu'       - Telugu model
+    #   'chiluka_hindi_english' - Hindi-English model (explicit)
+    print(f"Loading model via torch.hub (variant: {args.variant})...")
+    if args.variant == "telugu":
+        tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
+    else:
+        tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
+    # Synthesize
+    print(f"Synthesizing: '{args.text}'")
+    print(f"Language: {args.language}")
+    wav = tts.synthesize(
+        text=args.text,
+        reference_audio=args.reference,
+        language=args.language,
+    )
+    # Save
+    tts.save_wav(wav, args.output)
+    print(f"Duration: {len(wav) / 24000:.2f} seconds")
+if __name__ == "__main__":
+    main()

hubconf.py CHANGED Viewed

@@ -4,11 +4,11 @@ PyTorch Hub configuration for Chiluka TTS.
 Usage:
     import torch
-    # Load the model
-    tts = torch.hub.load('yourusername/chiluka', 'chiluka')
-    # Or with force reload
-    tts = torch.hub.load('yourusername/chiluka', 'chiluka', force_reload=True)
     # Generate speech
     wav = tts.synthesize(
@@ -37,11 +37,10 @@ dependencies = [
 def chiluka(pretrained: bool = True, device: str = None, **kwargs):
     """
-    Load Chiluka TTS model.
     Args:
         pretrained: If True, downloads pretrained weights from HuggingFace Hub.
-                   If False, returns uninitialized model (requires manual weight loading).
         device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
         **kwargs: Additional arguments passed to Chiluka constructor.
@@ -50,25 +49,23 @@ def chiluka(pretrained: bool = True, device: str = None, **kwargs):
     Example:
         >>> import torch
-        >>> tts = torch.hub.load('yourusername/chiluka', 'chiluka')
         >>> wav = tts.synthesize("Hello!", "reference.wav", language="en")
     """
     from chiluka import Chiluka
     if pretrained:
-        # Use from_pretrained to auto-download weights
-        return Chiluka.from_pretrained(device=device, **kwargs)
     else:
-        # Return model expecting local weights
         return Chiluka(device=device, **kwargs)
-def chiluka_from_hf(repo_id: str = "yourusername/chiluka-tts", device: str = None, **kwargs):
     """
-    Load Chiluka TTS from a specific HuggingFace Hub repository.
     Args:
-        repo_id: HuggingFace Hub repository ID (e.g., 'username/model-name')
         device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
         **kwargs: Additional arguments passed to Chiluka constructor.
@@ -77,8 +74,43 @@ def chiluka_from_hf(repo_id: str = "yourusername/chiluka-tts", device: str = Non
     Example:
         >>> import torch
-        >>> tts = torch.hub.load('yourusername/chiluka', 'chiluka_from_hf',
         ...                       repo_id='myuser/my-custom-chiluka')
     """
     from chiluka import Chiluka
-    return Chiluka.from_pretrained(repo_id=repo_id, device=device, **kwargs)

 Usage:
     import torch
+    # Load Hindi-English model (default)
+    tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
+    # Load Telugu model
+    tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
     # Generate speech
     wav = tts.synthesize(
 def chiluka(pretrained: bool = True, device: str = None, **kwargs):
     """
+    Load Chiluka Hindi-English TTS model (default).
     Args:
         pretrained: If True, downloads pretrained weights from HuggingFace Hub.
         device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
         **kwargs: Additional arguments passed to Chiluka constructor.
     Example:
         >>> import torch
+        >>> tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
         >>> wav = tts.synthesize("Hello!", "reference.wav", language="en")
     """
     from chiluka import Chiluka
     if pretrained:
+        return Chiluka.from_pretrained(model="hindi_english", device=device, **kwargs)
     else:
         return Chiluka(device=device, **kwargs)
+def chiluka_telugu(pretrained: bool = True, device: str = None, **kwargs):
     """
+    Load Chiluka Telugu TTS model.
     Args:
+        pretrained: If True, downloads pretrained weights from HuggingFace Hub.
         device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
         **kwargs: Additional arguments passed to Chiluka constructor.
     Example:
         >>> import torch
+        >>> tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
+        >>> wav = tts.synthesize("నమస్కారం", "reference.wav", language="te")
+    """
+    from chiluka import Chiluka
+    if pretrained:
+        return Chiluka.from_pretrained(model="telugu", device=device, **kwargs)
+    else:
+        return Chiluka(device=device, **kwargs)
+def chiluka_hindi_english(pretrained: bool = True, device: str = None, **kwargs):
+    """
+    Load Chiluka Hindi-English TTS model (explicit name).
+    Same as `chiluka()` but with an explicit name.
+    Example:
+        >>> import torch
+        >>> tts = torch.hub.load('Seemanth/chiluka', 'chiluka_hindi_english')
+    """
+    return chiluka(pretrained=pretrained, device=device, **kwargs)
+def chiluka_from_hf(repo_id: str = "Seemanth/chiluka-tts", model: str = "hindi_english", device: str = None, **kwargs):
+    """
+    Load Chiluka TTS from a specific HuggingFace Hub repository.
+    Args:
+        repo_id: HuggingFace Hub repository ID
+        model: Model variant ('hindi_english' or 'telugu')
+        device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
+    Example:
+        >>> import torch
+        >>> tts = torch.hub.load('Seemanth/chiluka', 'chiluka_from_hf',
         ...                       repo_id='myuser/my-custom-chiluka')
     """
     from chiluka import Chiluka
+    return Chiluka.from_pretrained(repo_id=repo_id, model=model, device=device, **kwargs)

pyproject.toml CHANGED Viewed

@@ -47,10 +47,10 @@ playback = ["pyaudio>=0.2.11"]
 dev = ["pytest>=7.0.0", "black>=22.0.0", "isort>=5.10.0"]
 [project.urls]
-Homepage = "https://github.com/yourusername/chiluka"
-Documentation = "https://github.com/yourusername/chiluka#readme"
-Repository = "https://github.com/yourusername/chiluka"
-Issues = "https://github.com/yourusername/chiluka/issues"
 [tool.setuptools.packages.find]
 where = ["."]

 dev = ["pytest>=7.0.0", "black>=22.0.0", "isort>=5.10.0"]
 [project.urls]
+Homepage = "https://github.com/Seemanth/chiluka"
+Documentation = "https://github.com/Seemanth/chiluka#readme"
+Repository = "https://github.com/Seemanth/chiluka"
+Issues = "https://github.com/Seemanth/chiluka/issues"
 [tool.setuptools.packages.find]
 where = ["."]

setup.py CHANGED Viewed

@@ -8,13 +8,34 @@ with open("README.md", "r", encoding="utf-8") as fh:
 setup(
     name="chiluka",
     version="0.1.0",
-    author="Your Name",
-    author_email="your.email@example.com",
     description="Chiluka - A lightweight TTS inference package based on StyleTTS2",
     long_description=long_description,
     long_description_content_type="text/markdown",
-    url="https://github.com/yourusername/chiluka",
     packages=find_packages(),
     classifiers=[
         "Development Status :: 3 - Alpha",
         "Intended Audience :: Developers",

 setup(
     name="chiluka",
     version="0.1.0",
+    author="Seemanth",
+    author_email="seemanth.k@purviewservices.com",
     description="Chiluka - A lightweight TTS inference package based on StyleTTS2",
     long_description=long_description,
     long_description_content_type="text/markdown",
+    url="https://github.com/PurviewVoiceBot/chiluka",
     packages=find_packages(),
+    include_package_data=False,  # Don't include large model files
+    package_data={
+        "chiluka": [
+            "configs/*.yml",
+            "pretrained/ASR/config.yml",
+            "pretrained/ASR/*.py",
+            "pretrained/JDC/*.py",
+            "pretrained/PLBERT/config.yml",
+            "pretrained/PLBERT/*.py",
+            "models/*.py",
+            "models/diffusion/*.py",
+        ],
+    },
+    exclude_package_data={
+        "chiluka": [
+            "checkpoints/*.pth",
+            "pretrained/ASR/*.pth",
+            "pretrained/JDC/*.t7",
+            "pretrained/PLBERT/*.t7",
+        ],
+    },
     classifiers=[
         "Development Status :: 3 - Alpha",
         "Intended Audience :: Developers",