Seemanth commited on Feb 2

Commit

13f85be

verified ·

1 Parent(s): f28049f

Add Chiluka TTS models (Hindi-English + Telugu)

Browse files

Files changed (36) hide show

README.md +217 -0
__init__.py +11 -7
checkpoints/epoch_2nd_00029.pth +3 -0
configs/config_hindi_english.yml +110 -0
hub.py +104 -90
inference.py +16 -11
models/__pycache__/__init__.cpython-310.pyc +0 -0
models/__pycache__/__init__.cpython-311.pyc +0 -0
models/__pycache__/__init__.cpython-313.pyc +0 -0
models/__pycache__/core.cpython-310.pyc +0 -0
models/__pycache__/core.cpython-311.pyc +0 -0
models/__pycache__/core.cpython-313.pyc +0 -0
models/__pycache__/hifigan.cpython-310.pyc +0 -0
models/__pycache__/hifigan.cpython-311.pyc +0 -0
models/__pycache__/hifigan.cpython-313.pyc +0 -0
models/diffusion/__pycache__/__init__.cpython-310.pyc +0 -0
models/diffusion/__pycache__/__init__.cpython-311.pyc +0 -0
models/diffusion/__pycache__/__init__.cpython-313.pyc +0 -0
models/diffusion/__pycache__/diffusion.cpython-310.pyc +0 -0
models/diffusion/__pycache__/diffusion.cpython-311.pyc +0 -0
models/diffusion/__pycache__/diffusion.cpython-313.pyc +0 -0
models/diffusion/__pycache__/modules.cpython-310.pyc +0 -0
models/diffusion/__pycache__/modules.cpython-311.pyc +0 -0
models/diffusion/__pycache__/modules.cpython-313.pyc +0 -0
models/diffusion/__pycache__/sampler.cpython-310.pyc +0 -0
models/diffusion/__pycache__/sampler.cpython-311.pyc +0 -0
models/diffusion/__pycache__/sampler.cpython-313.pyc +0 -0
models/diffusion/__pycache__/utils.cpython-310.pyc +0 -0
models/diffusion/__pycache__/utils.cpython-311.pyc +0 -0
models/diffusion/__pycache__/utils.cpython-313.pyc +0 -0
pretrained/ASR/__pycache__/__init__.cpython-310.pyc +0 -0
pretrained/ASR/__pycache__/layers.cpython-310.pyc +0 -0
pretrained/ASR/__pycache__/models.cpython-310.pyc +0 -0
pretrained/JDC/__pycache__/__init__.cpython-310.pyc +0 -0
pretrained/JDC/__pycache__/model.cpython-310.pyc +0 -0
pretrained/PLBERT/__pycache__/util.cpython-310.pyc +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,217 @@

+---
+language:
+  - en
+  - hi
+  - te
+license: mit
+library_name: chiluka
+pipeline_tag: text-to-speech
+tags:
+  - text-to-speech
+  - tts
+  - styletts2
+  - voice-cloning
+  - multi-language
+  - hindi
+  - english
+  - telugu
+  - multi-speaker
+  - style-transfer
+---
+# Chiluka TTS
+**Chiluka** (చిలుక - Telugu for "parrot") is a lightweight, self-contained Text-to-Speech inference package based on [StyleTTS2](https://github.com/yl4579/StyleTTS2).
+It supports **style transfer from reference audio** - give it a voice sample and it will speak in that style.
+## Available Models
+| Model | Name | Languages | Speakers | Description |
+|-------|------|-----------|----------|-------------|
+| **Hindi-English** (default) | `hindi_english` | Hindi, English | 5 | Multi-speaker Hindi + English TTS |
+| **Telugu** | `telugu` | Telugu, English | 1 | Single-speaker Telugu + English TTS |
+## Installation
+```bash
+pip install chiluka
+```
+Or from GitHub:
+```bash
+pip install git+https://github.com/PurviewVoiceBot/chiluka.git
+```
+**System dependency** (required for phonemization):
+```bash
+# Ubuntu/Debian
+sudo apt-get install espeak-ng
+# macOS
+brew install espeak-ng
+```
+## Quick Start
+```python
+from chiluka import Chiluka
+# Load model (weights download automatically on first use)
+tts = Chiluka.from_pretrained()
+# Synthesize speech
+wav = tts.synthesize(
+    text="Hello, this is Chiluka speaking!",
+    reference_audio="path/to/reference.wav",
+    language="en"
+)
+# Save output
+tts.save_wav(wav, "output.wav")
+```
+## Choose a Model
+```python
+from chiluka import Chiluka
+# Hindi + English (default)
+tts = Chiluka.from_pretrained(model="hindi_english")
+# Telugu + English
+tts = Chiluka.from_pretrained(model="telugu")
+```
+## Hindi Example
+```python
+tts = Chiluka.from_pretrained()
+wav = tts.synthesize(
+    text="नमस्ते, मैं चिलुका बोल रहा हूं",
+    reference_audio="reference.wav",
+    language="hi"
+)
+tts.save_wav(wav, "hindi_output.wav")
+```
+## Telugu Example
+```python
+tts = Chiluka.from_pretrained(model="telugu")
+wav = tts.synthesize(
+    text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
+    reference_audio="reference.wav",
+    language="te"
+)
+tts.save_wav(wav, "telugu_output.wav")
+```
+## PyTorch Hub
+```python
+import torch
+# Hindi-English (default)
+tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
+# Telugu
+tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
+wav = tts.synthesize("Hello!", "reference.wav", language="en")
+```
+## Synthesis Parameters
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `text` | required | Input text to synthesize |
+| `reference_audio` | required | Path to reference audio for voice style |
+| `language` | `"en"` | Language code (`en`, `hi`, `te`, etc.) |
+| `alpha` | `0.3` | Acoustic style mixing (0 = reference voice, 1 = predicted) |
+| `beta` | `0.7` | Prosodic style mixing (0 = reference prosody, 1 = predicted) |
+| `diffusion_steps` | `5` | More steps = better quality, slower inference |
+| `embedding_scale` | `1.0` | Classifier-free guidance strength |
+## How It Works
+Chiluka uses a StyleTTS2-based pipeline:
+1. **Text** is converted to phonemes using espeak-ng
+2. **PL-BERT** encodes text into contextual embeddings
+3. **Reference audio** is processed to extract a style vector
+4. **Diffusion model** samples a style conditioned on text
+5. **Prosody predictor** generates duration, pitch (F0), and energy
+6. **HiFi-GAN decoder** synthesizes the final waveform at 24kHz
+## Model Architecture
+- **Text Encoder**: Token embedding + CNN + BiLSTM
+- **Style Encoder**: Conv2D + Residual blocks (style_dim=128)
+- **Prosody Predictor**: LSTM-based with AdaIN normalization
+- **Diffusion Model**: Transformer-based denoiser with ADPM2 sampler
+- **Decoder**: HiFi-GAN vocoder (upsample rates: 10, 5, 3, 2)
+- **Pretrained sub-models**: PL-BERT (text), ASR (alignment), JDC (pitch)
+## File Structure
+```
+├── configs/
+│   ├── config_ft.yml                 # Telugu model config
+│   └── config_hindi_english.yml      # Hindi-English model config
+├── checkpoints/
+│   ├── epoch_2nd_00017.pth           # Telugu checkpoint (~2GB)
+│   └── epoch_2nd_00029.pth           # Hindi-English checkpoint (~2GB)
+├── pretrained/                       # Shared pretrained sub-models
+│   ├── ASR/                          # Text-to-mel alignment
+│   ├── JDC/                          # Pitch extraction (F0)
+│   └── PLBERT/                       # Text encoder
+├── models/                           # Model architecture code
+│   ├── core.py
+│   ├── hifigan.py
+│   └── diffusion/
+├── inference.py                      # Main API
+├── hub.py                            # HuggingFace Hub utilities
+└── text_utils.py                     # Phoneme tokenization
+```
+## Requirements
+- Python >= 3.8
+- PyTorch >= 1.13.0
+- CUDA recommended (works on CPU too)
+- espeak-ng system package
+## Limitations
+- Requires a reference audio file for style/voice transfer
+- Quality depends on the reference audio quality
+- Best results with 3-15 second reference clips
+- Hindi-English model trained on 5 speakers
+- Telugu model trained on 1 speaker
+## Citation
+Based on StyleTTS2:
+```bibtex
+@inproceedings{li2024styletts,
+  title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models},
+  author={Li, Yinghao Aaron and Han, Cong and Raber, Vinay S and Mesgarani, Nima},
+  booktitle={NeurIPS},
+  year={2024}
+}
+```
+## License
+MIT License
+## Links
+- **GitHub**: [PurviewVoiceBot/chiluka](https://github.com/PurviewVoiceBot/chiluka)
+- **PyPI**: [chiluka](https://pypi.org/project/chiluka/)

__init__.py CHANGED Viewed

@@ -1,17 +1,17 @@
 """
 Chiluka - A lightweight TTS inference package based on StyleTTS2
-Usage:
-    # Local weights (if you have them)
-    from chiluka import Chiluka
-    tts = Chiluka()
-    # Auto-download from HuggingFace Hub (recommended)
     from chiluka import Chiluka
     tts = Chiluka.from_pretrained()
-    # From specific HuggingFace repo
-    tts = Chiluka.from_pretrained("username/model-name")
     # Generate speech
     wav = tts.synthesize(
@@ -31,7 +31,9 @@ from .hub import (
     clear_cache,
     get_cache_dir,
     create_model_card,
     DEFAULT_HF_REPO,
 )
 __all__ = [
@@ -41,5 +43,7 @@ __all__ = [
     "clear_cache",
     "get_cache_dir",
     "create_model_card",
     "DEFAULT_HF_REPO",
 ]

 """
 Chiluka - A lightweight TTS inference package based on StyleTTS2
+Available models:
+    - 'hindi_english' (default) - Hindi + English multi-speaker TTS
+    - 'telugu' - Telugu + English single-speaker TTS
+Usage:
+    # Hindi-English model (default, auto-downloads from HuggingFace)
     from chiluka import Chiluka
     tts = Chiluka.from_pretrained()
+    # Telugu model
+    tts = Chiluka.from_pretrained(model="telugu")
     # Generate speech
     wav = tts.synthesize(
     clear_cache,
     get_cache_dir,
     create_model_card,
+    list_models,
     DEFAULT_HF_REPO,
+    MODEL_REGISTRY,
 )
 __all__ = [
     "clear_cache",
     "get_cache_dir",
     "create_model_card",
+    "list_models",
     "DEFAULT_HF_REPO",
+    "MODEL_REGISTRY",
 ]

checkpoints/epoch_2nd_00029.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fdaefa463728b71e146ad45bac776cefca75781eecbe96ca84c591ece59a46cc
+size 2242832963

configs/config_hindi_english.yml ADDED Viewed

	@@ -0,0 +1,110 @@

+log_dir: "Models/hindi_english_multispeaker_finetuned"
+first_stage_path: "first_stage.pth"
+save_freq: 1
+log_interval: 10
+device: "cuda"
+epochs_1st: 15
+epochs_2nd: 15
+batch_size: 2
+max_len: 200
+pretrained_model: ""
+second_stage_load_pretrained: true
+load_only_params: true
+F0_path: "Utils/JDC/bst.t7"
+ASR_config: "Utils/ASR/config.yml"
+ASR_path: "Utils/ASR/epoch_00080.pth"
+PLBERT_dir: "Utils/PLBERT/"
+data_params:
+  train_data: ""
+  val_data: ""
+  root_path: ""
+  OOD_data: ""
+  min_length: 50
+# Audio preprocessing (24kHz)
+preprocess_params:
+  sr: 24000
+  spect_params:
+    n_fft: 2048
+    win_length: 1200
+    hop_length: 300
+# Model architecture
+model_params:
+  multispeaker: true
+  num_speakers: 5
+  dim_in: 64
+  hidden_dim: 512
+  max_conv_dim: 512
+  n_layer: 3
+  n_mels: 80
+  n_token: 178
+  max_dur: 50
+  style_dim: 128
+  dropout: 0.2
+  speaker_embed_dim: 256
+  decoder:
+    type: "hifigan"
+    resblock_dilation_sizes: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
+    resblock_kernel_sizes: [3, 7, 11]
+    upsample_initial_channel: 512
+    upsample_rates: [10, 5, 3, 2]
+    upsample_kernel_sizes: [20, 10, 6, 4]
+  slm:
+    model: "microsoft/wavlm-base-plus"
+    sr: 16000
+    hidden: 768
+    nlayers: 13
+    initial_channel: 64
+  diffusion:
+    embedding_mask_proba: 0.1
+    transformer:
+      num_layers: 3
+      num_heads: 8
+      head_features: 64
+      multiplier: 2
+    dist:
+      sigma_data: 0.19926648961191362
+      estimate_sigma_data: true
+      mean: -3.0
+      std: 1.0
+loss_params:
+  lambda_mel: 5.0
+  lambda_gen: 1.0
+  lambda_slm: 1.0
+  lambda_mono: 1.0
+  lambda_s2s: 1.0
+  lambda_F0: 1.0
+  lambda_norm: 1.0
+  lambda_dur: 1.0
+  lambda_ce: 20.0
+  lambda_sty: 1.0
+  lambda_diff: 1.0
+  TMA_epoch: 2
+  diff_epoch: 0
+  joint_epoch: 0
+optimizer_params:
+  lr: 0.00005
+  bert_lr: 0.000005
+  ft_lr: 0.000005
+slmadv_params:
+  min_len: 400
+  max_len: 500
+  batch_percentage: 0.5
+  iter: 20
+  thresh: 5
+  scale: 0.01
+  sig: 1.5

hub.py CHANGED Viewed

@@ -5,6 +5,7 @@ Supports:
 - HuggingFace Hub integration
 - Automatic model downloading
 - Local caching
 """
 import os
@@ -13,15 +14,35 @@ from pathlib import Path
 from typing import Optional, Union
 # Default HuggingFace Hub repository
-DEFAULT_HF_REPO = "yourusername/chiluka-tts"  # TODO: Update with your actual repo
 # Cache directory for downloaded models
 CACHE_DIR = Path.home() / ".cache" / "chiluka"
-# Required model files
-REQUIRED_FILES = {
-    "checkpoint": "checkpoints/epoch_2nd_00017.pth",
-    "config": "configs/config_ft.yml",
     "asr_config": "pretrained/ASR/config.yml",
     "asr_model": "pretrained/ASR/epoch_00080.pth",
     "f0_model": "pretrained/JDC/bst.t7",
@@ -30,6 +51,27 @@ REQUIRED_FILES = {
 }
 def get_cache_dir() -> Path:
     """Get the cache directory for Chiluka models."""
     cache_dir = Path(os.environ.get("CHILUKA_CACHE", CACHE_DIR))
@@ -43,11 +85,19 @@ def is_model_cached(repo_id: str = DEFAULT_HF_REPO) -> bool:
     if not cache_path.exists():
         return False
-    # Check if all required files exist
-    for file_path in REQUIRED_FILES.values():
         if not (cache_path / file_path).exists():
             return False
-    return True
 def download_from_hf(
@@ -60,21 +110,16 @@ def download_from_hf(
     Download model files from HuggingFace Hub.
     Args:
-        repo_id: HuggingFace Hub repository ID (e.g., 'username/model-name')
         revision: Git revision to download (branch, tag, or commit hash)
         force_download: If True, re-download even if cached
         token: HuggingFace API token for private repos
     Returns:
         Path to the downloaded model directory
-    Example:
-        >>> model_path = download_from_hf("yourusername/chiluka-tts")
-        >>> print(model_path)
-        /home/user/.cache/chiluka/yourusername_chiluka-tts
     """
     try:
-        from huggingface_hub import snapshot_download, hf_hub_download
     except ImportError:
         raise ImportError(
             "huggingface_hub is required for downloading models. "
@@ -89,7 +134,6 @@ def download_from_hf(
     print(f"Downloading model from HuggingFace Hub: {repo_id}...")
-    # Download entire repository
     downloaded_path = snapshot_download(
         repo_id=repo_id,
         revision=revision,
@@ -103,60 +147,32 @@ def download_from_hf(
     return Path(downloaded_path)
-def download_from_url(
-    url: str,
-    filename: str,
-    force_download: bool = False,
-) -> Path:
-    """
-    Download a single file from a URL.
-    Args:
-        url: URL to download from
-        filename: Local filename to save as
-        force_download: If True, re-download even if exists
-    Returns:
-        Path to the downloaded file
-    """
-    import urllib.request
-    cache_dir = get_cache_dir() / "downloads"
-    cache_dir.mkdir(parents=True, exist_ok=True)
-    local_path = cache_dir / filename
-    if local_path.exists() and not force_download:
-        print(f"Using cached file: {local_path}")
-        return local_path
-    print(f"Downloading {filename}...")
-    # Download with progress
-    def _progress_hook(count, block_size, total_size):
-        percent = int(count * block_size * 100 / total_size)
-        print(f"\rDownloading: {percent}%", end="", flush=True)
-    urllib.request.urlretrieve(url, local_path, reporthook=_progress_hook)
-    print()  # New line after progress
-    return local_path
-def get_model_paths(repo_id: str = DEFAULT_HF_REPO) -> dict:
     """
     Get paths to all model files after downloading.
     Args:
         repo_id: HuggingFace Hub repository ID
     Returns:
         Dictionary with paths to config, checkpoint, and pretrained directory
     """
     model_dir = download_from_hf(repo_id)
     return {
-        "config_path": str(model_dir / "configs" / "config_ft.yml"),
-        "checkpoint_path": str(model_dir / "checkpoints" / "epoch_2nd_00017.pth"),
         "pretrained_dir": str(model_dir / "pretrained"),
     }
@@ -202,7 +218,7 @@ def push_to_hub(
     Example:
         >>> push_to_hub(
         ...     local_dir="./chiluka",
-        ...     repo_id="myusername/my-chiluka-model",
         ...     private=False
         ... )
     """
@@ -245,6 +261,14 @@ def create_model_card(repo_id: str, save_path: Optional[str] = None) -> str:
     Returns:
         Model card content as string
     """
     model_card = f"""---
 language:
   - en
@@ -257,12 +281,19 @@ tags:
   - tts
   - styletts2
   - voice-cloning
 ---
 # Chiluka TTS
 Chiluka (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on StyleTTS2.
 ## Installation
 ```bash
@@ -272,64 +303,47 @@ pip install chiluka
 Or install from source:
 ```bash
-pip install git+https://github.com/{repo_id.split('/')[0]}/chiluka.git
 ```
 ## Usage
-### Quick Start (Auto-download)
 ```python
 from chiluka import Chiluka
-# Automatically downloads model weights
 tts = Chiluka.from_pretrained()
-# Generate speech
 wav = tts.synthesize(
     text="Hello, world!",
-    reference_audio="path/to/reference.wav",
     language="en"
 )
-# Save output
 tts.save_wav(wav, "output.wav")
 ```
-### PyTorch Hub
 ```python
-import torch
-tts = torch.hub.load('{repo_id.split('/')[0]}/chiluka', 'chiluka')
-wav = tts.synthesize("Hello!", "reference.wav", language="en")
 ```
-### HuggingFace Hub
 ```python
-from chiluka import Chiluka
-tts = Chiluka.from_pretrained("{repo_id}")
 ```
-## Parameters
-- `text`: Input text to synthesize
-- `reference_audio`: Path to reference audio for style transfer
-- `language`: Language code ('en', 'te', 'hi', etc.)
-- `alpha`: Acoustic style mixing (0-1, default 0.3)
-- `beta`: Prosodic style mixing (0-1, default 0.7)
-- `diffusion_steps`: Quality vs speed tradeoff (default 5)
-## Supported Languages
-Uses espeak-ng phonemizer. Common languages:
-- English: `en-us`, `en-gb`
-- Telugu: `te`
-- Hindi: `hi`
-- Tamil: `ta`
 ## License
 MIT License

 - HuggingFace Hub integration
 - Automatic model downloading
 - Local caching
+- Multiple model variants
 """
 import os
 from typing import Optional, Union
 # Default HuggingFace Hub repository
+DEFAULT_HF_REPO = "Seemanth/chiluka-tts"
 # Cache directory for downloaded models
 CACHE_DIR = Path.home() / ".cache" / "chiluka"
+# ============================================
+# Model Registry
+# ============================================
+# Maps model names to their config + checkpoint paths
+# relative to the repo root.
+MODEL_REGISTRY = {
+    "telugu": {
+        "config": "configs/config_ft.yml",
+        "checkpoint": "checkpoints/epoch_2nd_00017.pth",
+        "languages": ["te", "en"],
+        "description": "Telugu + English single-speaker TTS",
+    },
+    "hindi_english": {
+        "config": "configs/config_hindi_english.yml",
+        "checkpoint": "checkpoints/epoch_2nd_00029.pth",
+        "languages": ["hi", "en"],
+        "description": "Hindi + English multi-speaker TTS (5 speakers)",
+    },
+}
+DEFAULT_MODEL = "hindi_english"
+# Shared pretrained sub-models (same across all variants)
+PRETRAINED_FILES = {
     "asr_config": "pretrained/ASR/config.yml",
     "asr_model": "pretrained/ASR/epoch_00080.pth",
     "f0_model": "pretrained/JDC/bst.t7",
 }
+def list_models() -> dict:
+    """
+    List all available model variants.
+    Returns:
+        Dictionary of model names and their info.
+    Example:
+        >>> from chiluka import hub
+        >>> hub.list_models()
+        {'telugu': {...}, 'hindi_english': {...}}
+    """
+    return {
+        name: {
+            "languages": info["languages"],
+            "description": info["description"],
+        }
+        for name, info in MODEL_REGISTRY.items()
+    }
 def get_cache_dir() -> Path:
     """Get the cache directory for Chiluka models."""
     cache_dir = Path(os.environ.get("CHILUKA_CACHE", CACHE_DIR))
     if not cache_path.exists():
         return False
+    # Check if shared pretrained files exist
+    for file_path in PRETRAINED_FILES.values():
         if not (cache_path / file_path).exists():
             return False
+    # Check if at least one model variant exists
+    for model_info in MODEL_REGISTRY.values():
+        config_exists = (cache_path / model_info["config"]).exists()
+        checkpoint_exists = (cache_path / model_info["checkpoint"]).exists()
+        if config_exists and checkpoint_exists:
+            return True
+    return False
 def download_from_hf(
     Download model files from HuggingFace Hub.
     Args:
+        repo_id: HuggingFace Hub repository ID (e.g., 'Seemanth/chiluka-tts')
         revision: Git revision to download (branch, tag, or commit hash)
         force_download: If True, re-download even if cached
         token: HuggingFace API token for private repos
     Returns:
         Path to the downloaded model directory
     """
     try:
+        from huggingface_hub import snapshot_download
     except ImportError:
         raise ImportError(
             "huggingface_hub is required for downloading models. "
     print(f"Downloading model from HuggingFace Hub: {repo_id}...")
     downloaded_path = snapshot_download(
         repo_id=repo_id,
         revision=revision,
     return Path(downloaded_path)
+def get_model_paths(
+    model: str = DEFAULT_MODEL,
+    repo_id: str = DEFAULT_HF_REPO,
+) -> dict:
     """
     Get paths to all model files after downloading.
     Args:
+        model: Model variant name ('telugu', 'hindi_english')
         repo_id: HuggingFace Hub repository ID
     Returns:
         Dictionary with paths to config, checkpoint, and pretrained directory
     """
+    if model not in MODEL_REGISTRY:
+        available = ", ".join(MODEL_REGISTRY.keys())
+        raise ValueError(
+            f"Unknown model '{model}'. Available models: {available}"
+        )
     model_dir = download_from_hf(repo_id)
+    model_info = MODEL_REGISTRY[model]
     return {
+        "config_path": str(model_dir / model_info["config"]),
+        "checkpoint_path": str(model_dir / model_info["checkpoint"]),
         "pretrained_dir": str(model_dir / "pretrained"),
     }
     Example:
         >>> push_to_hub(
         ...     local_dir="./chiluka",
+        ...     repo_id="Seemanth/chiluka-tts",
         ...     private=False
         ... )
     """
     Returns:
         Model card content as string
     """
+    owner = repo_id.split("/")[0]
+    # Build model table
+    model_rows = ""
+    for name, info in MODEL_REGISTRY.items():
+        langs = ", ".join(info["languages"])
+        model_rows += f"| `{name}` | {info['description']} | {langs} |\n"
     model_card = f"""---
 language:
   - en
   - tts
   - styletts2
   - voice-cloning
+  - multi-language
 ---
 # Chiluka TTS
 Chiluka (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on StyleTTS2.
+## Available Models
+| Model | Description | Languages |
+|-------|-------------|-----------|
+{model_rows}
 ## Installation
 ```bash
 Or install from source:
 ```bash
+pip install git+https://github.com/{owner}/chiluka.git
 ```
 ## Usage
+### Hindi + English (default)
 ```python
 from chiluka import Chiluka
 tts = Chiluka.from_pretrained()
 wav = tts.synthesize(
     text="Hello, world!",
+    reference_audio="reference.wav",
     language="en"
 )
 tts.save_wav(wav, "output.wav")
 ```
+### Telugu
 ```python
+tts = Chiluka.from_pretrained(model="telugu")
+wav = tts.synthesize(
+    text="నమస్కారం",
+    reference_audio="reference.wav",
+    language="te"
+)
 ```
+### PyTorch Hub
 ```python
+import torch
+tts = torch.hub.load('{owner}/chiluka', 'chiluka')
+tts = torch.hub.load('{owner}/chiluka', 'chiluka_telugu')
 ```
 ## License
 MIT License

inference.py CHANGED Viewed

@@ -155,6 +155,7 @@ class Chiluka:
     @classmethod
     def from_pretrained(
         cls,
         repo_id: str = None,
         device: Optional[str] = None,
         force_download: bool = False,
@@ -168,7 +169,10 @@ class Chiluka:
         Weights are automatically downloaded and cached on first use.
         Args:
-            repo_id: HuggingFace Hub repository ID (e.g., 'username/chiluka-tts').
                     If None, uses the default repository.
             device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
             force_download: If True, re-download even if cached.
@@ -179,31 +183,32 @@ class Chiluka:
             Initialized Chiluka TTS model ready for inference.
         Examples:
-            # Default repository (auto-download)
             >>> tts = Chiluka.from_pretrained()
-            # Specific repository
-            >>> tts = Chiluka.from_pretrained("myuser/my-chiluka-model")
             # Force re-download
             >>> tts = Chiluka.from_pretrained(force_download=True)
-            # Private repository
-            >>> tts = Chiluka.from_pretrained("myuser/private-model", token="hf_xxx")
         """
-        from .hub import download_from_hf, get_model_paths, DEFAULT_HF_REPO
         repo_id = repo_id or DEFAULT_HF_REPO
         # Download model files (or use cache)
-        model_dir = download_from_hf(
             repo_id=repo_id,
             force_download=force_download,
             token=token,
         )
-        # Get paths to model files
-        paths = get_model_paths(repo_id)
         return cls(
             config_path=paths["config_path"],

     @classmethod
     def from_pretrained(
         cls,
+        model: str = None,
         repo_id: str = None,
         device: Optional[str] = None,
         force_download: bool = False,
         Weights are automatically downloaded and cached on first use.
         Args:
+            model: Model variant to load. Options:
+                - 'hindi_english' (default) - Hindi + English multi-speaker TTS
+                - 'telugu' - Telugu + English single-speaker TTS
+            repo_id: HuggingFace Hub repository ID (e.g., 'Seemanth/chiluka-tts').
                     If None, uses the default repository.
             device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
             force_download: If True, re-download even if cached.
             Initialized Chiluka TTS model ready for inference.
         Examples:
+            # Hindi-English model (default)
             >>> tts = Chiluka.from_pretrained()
+            # Telugu model
+            >>> tts = Chiluka.from_pretrained(model="telugu")
+            # Specific HuggingFace repository
+            >>> tts = Chiluka.from_pretrained(repo_id="myuser/my-model")
             # Force re-download
             >>> tts = Chiluka.from_pretrained(force_download=True)
         """
+        from .hub import download_from_hf, get_model_paths, DEFAULT_HF_REPO, DEFAULT_MODEL
+        model = model or DEFAULT_MODEL
         repo_id = repo_id or DEFAULT_HF_REPO
         # Download model files (or use cache)
+        download_from_hf(
             repo_id=repo_id,
             force_download=force_download,
             token=token,
         )
+        # Get paths to model files for the selected variant
+        paths = get_model_paths(model=model, repo_id=repo_id)
         return cls(
             config_path=paths["config_path"],

models/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (425 Bytes). View file

models/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (544 Bytes). View file

models/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (450 Bytes). View file

models/__pycache__/core.cpython-310.pyc ADDED Viewed

Binary file (27.9 kB). View file

models/__pycache__/core.cpython-311.pyc ADDED Viewed

Binary file (60.9 kB). View file

models/__pycache__/core.cpython-313.pyc ADDED Viewed

Binary file (55.2 kB). View file

models/__pycache__/hifigan.cpython-310.pyc ADDED Viewed

Binary file (11.2 kB). View file

models/__pycache__/hifigan.cpython-311.pyc ADDED Viewed

Binary file (25 kB). View file

models/__pycache__/hifigan.cpython-313.pyc ADDED Viewed

Binary file (22 kB). View file

models/diffusion/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (568 Bytes). View file

models/diffusion/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (707 Bytes). View file

models/diffusion/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (592 Bytes). View file

models/diffusion/__pycache__/diffusion.cpython-310.pyc ADDED Viewed

Binary file (3.47 kB). View file

models/diffusion/__pycache__/diffusion.cpython-311.pyc ADDED Viewed

Binary file (5.14 kB). View file

models/diffusion/__pycache__/diffusion.cpython-313.pyc ADDED Viewed

Binary file (4.56 kB). View file

models/diffusion/__pycache__/modules.cpython-310.pyc ADDED Viewed

Binary file (14.5 kB). View file

models/diffusion/__pycache__/modules.cpython-311.pyc ADDED Viewed

Binary file (29.8 kB). View file

models/diffusion/__pycache__/modules.cpython-313.pyc ADDED Viewed

Binary file (26 kB). View file

models/diffusion/__pycache__/sampler.cpython-310.pyc ADDED Viewed

Binary file (9.14 kB). View file

models/diffusion/__pycache__/sampler.cpython-311.pyc ADDED Viewed

Binary file (15 kB). View file

models/diffusion/__pycache__/sampler.cpython-313.pyc ADDED Viewed

Binary file (13.7 kB). View file

models/diffusion/__pycache__/utils.cpython-310.pyc ADDED Viewed

Binary file (1.98 kB). View file

models/diffusion/__pycache__/utils.cpython-311.pyc ADDED Viewed

Binary file (3.43 kB). View file

models/diffusion/__pycache__/utils.cpython-313.pyc ADDED Viewed

Binary file (2.72 kB). View file

pretrained/ASR/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (150 Bytes). View file

pretrained/ASR/__pycache__/layers.cpython-310.pyc ADDED Viewed

Binary file (11 kB). View file

pretrained/ASR/__pycache__/models.cpython-310.pyc ADDED Viewed

Binary file (6.12 kB). View file

pretrained/JDC/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (150 Bytes). View file

pretrained/JDC/__pycache__/model.cpython-310.pyc ADDED Viewed

Binary file (4.78 kB). View file

pretrained/PLBERT/__pycache__/util.cpython-310.pyc ADDED Viewed

Binary file (1.75 kB). View file