Seemanth commited on
Commit
13f85be
·
verified ·
1 Parent(s): f28049f

Add Chiluka TTS models (Hindi-English + Telugu)

Browse files
Files changed (36) hide show
  1. README.md +217 -0
  2. __init__.py +11 -7
  3. checkpoints/epoch_2nd_00029.pth +3 -0
  4. configs/config_hindi_english.yml +110 -0
  5. hub.py +104 -90
  6. inference.py +16 -11
  7. models/__pycache__/__init__.cpython-310.pyc +0 -0
  8. models/__pycache__/__init__.cpython-311.pyc +0 -0
  9. models/__pycache__/__init__.cpython-313.pyc +0 -0
  10. models/__pycache__/core.cpython-310.pyc +0 -0
  11. models/__pycache__/core.cpython-311.pyc +0 -0
  12. models/__pycache__/core.cpython-313.pyc +0 -0
  13. models/__pycache__/hifigan.cpython-310.pyc +0 -0
  14. models/__pycache__/hifigan.cpython-311.pyc +0 -0
  15. models/__pycache__/hifigan.cpython-313.pyc +0 -0
  16. models/diffusion/__pycache__/__init__.cpython-310.pyc +0 -0
  17. models/diffusion/__pycache__/__init__.cpython-311.pyc +0 -0
  18. models/diffusion/__pycache__/__init__.cpython-313.pyc +0 -0
  19. models/diffusion/__pycache__/diffusion.cpython-310.pyc +0 -0
  20. models/diffusion/__pycache__/diffusion.cpython-311.pyc +0 -0
  21. models/diffusion/__pycache__/diffusion.cpython-313.pyc +0 -0
  22. models/diffusion/__pycache__/modules.cpython-310.pyc +0 -0
  23. models/diffusion/__pycache__/modules.cpython-311.pyc +0 -0
  24. models/diffusion/__pycache__/modules.cpython-313.pyc +0 -0
  25. models/diffusion/__pycache__/sampler.cpython-310.pyc +0 -0
  26. models/diffusion/__pycache__/sampler.cpython-311.pyc +0 -0
  27. models/diffusion/__pycache__/sampler.cpython-313.pyc +0 -0
  28. models/diffusion/__pycache__/utils.cpython-310.pyc +0 -0
  29. models/diffusion/__pycache__/utils.cpython-311.pyc +0 -0
  30. models/diffusion/__pycache__/utils.cpython-313.pyc +0 -0
  31. pretrained/ASR/__pycache__/__init__.cpython-310.pyc +0 -0
  32. pretrained/ASR/__pycache__/layers.cpython-310.pyc +0 -0
  33. pretrained/ASR/__pycache__/models.cpython-310.pyc +0 -0
  34. pretrained/JDC/__pycache__/__init__.cpython-310.pyc +0 -0
  35. pretrained/JDC/__pycache__/model.cpython-310.pyc +0 -0
  36. pretrained/PLBERT/__pycache__/util.cpython-310.pyc +0 -0
README.md ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - hi
5
+ - te
6
+ license: mit
7
+ library_name: chiluka
8
+ pipeline_tag: text-to-speech
9
+ tags:
10
+ - text-to-speech
11
+ - tts
12
+ - styletts2
13
+ - voice-cloning
14
+ - multi-language
15
+ - hindi
16
+ - english
17
+ - telugu
18
+ - multi-speaker
19
+ - style-transfer
20
+ ---
21
+
22
+ # Chiluka TTS
23
+
24
+ **Chiluka** (చిలుక - Telugu for "parrot") is a lightweight, self-contained Text-to-Speech inference package based on [StyleTTS2](https://github.com/yl4579/StyleTTS2).
25
+
26
+ It supports **style transfer from reference audio** - give it a voice sample and it will speak in that style.
27
+
28
+ ## Available Models
29
+
30
+ | Model | Name | Languages | Speakers | Description |
31
+ |-------|------|-----------|----------|-------------|
32
+ | **Hindi-English** (default) | `hindi_english` | Hindi, English | 5 | Multi-speaker Hindi + English TTS |
33
+ | **Telugu** | `telugu` | Telugu, English | 1 | Single-speaker Telugu + English TTS |
34
+
35
+ ## Installation
36
+
37
+ ```bash
38
+ pip install chiluka
39
+ ```
40
+
41
+ Or from GitHub:
42
+
43
+ ```bash
44
+ pip install git+https://github.com/PurviewVoiceBot/chiluka.git
45
+ ```
46
+
47
+ **System dependency** (required for phonemization):
48
+
49
+ ```bash
50
+ # Ubuntu/Debian
51
+ sudo apt-get install espeak-ng
52
+
53
+ # macOS
54
+ brew install espeak-ng
55
+ ```
56
+
57
+ ## Quick Start
58
+
59
+ ```python
60
+ from chiluka import Chiluka
61
+
62
+ # Load model (weights download automatically on first use)
63
+ tts = Chiluka.from_pretrained()
64
+
65
+ # Synthesize speech
66
+ wav = tts.synthesize(
67
+ text="Hello, this is Chiluka speaking!",
68
+ reference_audio="path/to/reference.wav",
69
+ language="en"
70
+ )
71
+
72
+ # Save output
73
+ tts.save_wav(wav, "output.wav")
74
+ ```
75
+
76
+ ## Choose a Model
77
+
78
+ ```python
79
+ from chiluka import Chiluka
80
+
81
+ # Hindi + English (default)
82
+ tts = Chiluka.from_pretrained(model="hindi_english")
83
+
84
+ # Telugu + English
85
+ tts = Chiluka.from_pretrained(model="telugu")
86
+ ```
87
+
88
+ ## Hindi Example
89
+
90
+ ```python
91
+ tts = Chiluka.from_pretrained()
92
+
93
+ wav = tts.synthesize(
94
+ text="नमस्ते, मैं चिलुका बोल रहा हूं",
95
+ reference_audio="reference.wav",
96
+ language="hi"
97
+ )
98
+ tts.save_wav(wav, "hindi_output.wav")
99
+ ```
100
+
101
+ ## Telugu Example
102
+
103
+ ```python
104
+ tts = Chiluka.from_pretrained(model="telugu")
105
+
106
+ wav = tts.synthesize(
107
+ text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
108
+ reference_audio="reference.wav",
109
+ language="te"
110
+ )
111
+ tts.save_wav(wav, "telugu_output.wav")
112
+ ```
113
+
114
+ ## PyTorch Hub
115
+
116
+ ```python
117
+ import torch
118
+
119
+ # Hindi-English (default)
120
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
121
+
122
+ # Telugu
123
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
124
+
125
+ wav = tts.synthesize("Hello!", "reference.wav", language="en")
126
+ ```
127
+
128
+ ## Synthesis Parameters
129
+
130
+ | Parameter | Default | Description |
131
+ |-----------|---------|-------------|
132
+ | `text` | required | Input text to synthesize |
133
+ | `reference_audio` | required | Path to reference audio for voice style |
134
+ | `language` | `"en"` | Language code (`en`, `hi`, `te`, etc.) |
135
+ | `alpha` | `0.3` | Acoustic style mixing (0 = reference voice, 1 = predicted) |
136
+ | `beta` | `0.7` | Prosodic style mixing (0 = reference prosody, 1 = predicted) |
137
+ | `diffusion_steps` | `5` | More steps = better quality, slower inference |
138
+ | `embedding_scale` | `1.0` | Classifier-free guidance strength |
139
+
140
+ ## How It Works
141
+
142
+ Chiluka uses a StyleTTS2-based pipeline:
143
+
144
+ 1. **Text** is converted to phonemes using espeak-ng
145
+ 2. **PL-BERT** encodes text into contextual embeddings
146
+ 3. **Reference audio** is processed to extract a style vector
147
+ 4. **Diffusion model** samples a style conditioned on text
148
+ 5. **Prosody predictor** generates duration, pitch (F0), and energy
149
+ 6. **HiFi-GAN decoder** synthesizes the final waveform at 24kHz
150
+
151
+ ## Model Architecture
152
+
153
+ - **Text Encoder**: Token embedding + CNN + BiLSTM
154
+ - **Style Encoder**: Conv2D + Residual blocks (style_dim=128)
155
+ - **Prosody Predictor**: LSTM-based with AdaIN normalization
156
+ - **Diffusion Model**: Transformer-based denoiser with ADPM2 sampler
157
+ - **Decoder**: HiFi-GAN vocoder (upsample rates: 10, 5, 3, 2)
158
+ - **Pretrained sub-models**: PL-BERT (text), ASR (alignment), JDC (pitch)
159
+
160
+ ## File Structure
161
+
162
+ ```
163
+ ├── configs/
164
+ │ ├── config_ft.yml # Telugu model config
165
+ │ └── config_hindi_english.yml # Hindi-English model config
166
+ ├── checkpoints/
167
+ │ ├── epoch_2nd_00017.pth # Telugu checkpoint (~2GB)
168
+ │ └── epoch_2nd_00029.pth # Hindi-English checkpoint (~2GB)
169
+ ├── pretrained/ # Shared pretrained sub-models
170
+ │ ├── ASR/ # Text-to-mel alignment
171
+ │ ├── JDC/ # Pitch extraction (F0)
172
+ │ └── PLBERT/ # Text encoder
173
+ ├── models/ # Model architecture code
174
+ │ ├── core.py
175
+ │ ├── hifigan.py
176
+ │ └── diffusion/
177
+ ├── inference.py # Main API
178
+ ├── hub.py # HuggingFace Hub utilities
179
+ └── text_utils.py # Phoneme tokenization
180
+ ```
181
+
182
+ ## Requirements
183
+
184
+ - Python >= 3.8
185
+ - PyTorch >= 1.13.0
186
+ - CUDA recommended (works on CPU too)
187
+ - espeak-ng system package
188
+
189
+ ## Limitations
190
+
191
+ - Requires a reference audio file for style/voice transfer
192
+ - Quality depends on the reference audio quality
193
+ - Best results with 3-15 second reference clips
194
+ - Hindi-English model trained on 5 speakers
195
+ - Telugu model trained on 1 speaker
196
+
197
+ ## Citation
198
+
199
+ Based on StyleTTS2:
200
+
201
+ ```bibtex
202
+ @inproceedings{li2024styletts,
203
+ title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models},
204
+ author={Li, Yinghao Aaron and Han, Cong and Raber, Vinay S and Mesgarani, Nima},
205
+ booktitle={NeurIPS},
206
+ year={2024}
207
+ }
208
+ ```
209
+
210
+ ## License
211
+
212
+ MIT License
213
+
214
+ ## Links
215
+
216
+ - **GitHub**: [PurviewVoiceBot/chiluka](https://github.com/PurviewVoiceBot/chiluka)
217
+ - **PyPI**: [chiluka](https://pypi.org/project/chiluka/)
__init__.py CHANGED
@@ -1,17 +1,17 @@
1
  """
2
  Chiluka - A lightweight TTS inference package based on StyleTTS2
3
 
4
- Usage:
5
- # Local weights (if you have them)
6
- from chiluka import Chiluka
7
- tts = Chiluka()
8
 
9
- # Auto-download from HuggingFace Hub (recommended)
 
10
  from chiluka import Chiluka
11
  tts = Chiluka.from_pretrained()
12
 
13
- # From specific HuggingFace repo
14
- tts = Chiluka.from_pretrained("username/model-name")
15
 
16
  # Generate speech
17
  wav = tts.synthesize(
@@ -31,7 +31,9 @@ from .hub import (
31
  clear_cache,
32
  get_cache_dir,
33
  create_model_card,
 
34
  DEFAULT_HF_REPO,
 
35
  )
36
 
37
  __all__ = [
@@ -41,5 +43,7 @@ __all__ = [
41
  "clear_cache",
42
  "get_cache_dir",
43
  "create_model_card",
 
44
  "DEFAULT_HF_REPO",
 
45
  ]
 
1
  """
2
  Chiluka - A lightweight TTS inference package based on StyleTTS2
3
 
4
+ Available models:
5
+ - 'hindi_english' (default) - Hindi + English multi-speaker TTS
6
+ - 'telugu' - Telugu + English single-speaker TTS
 
7
 
8
+ Usage:
9
+ # Hindi-English model (default, auto-downloads from HuggingFace)
10
  from chiluka import Chiluka
11
  tts = Chiluka.from_pretrained()
12
 
13
+ # Telugu model
14
+ tts = Chiluka.from_pretrained(model="telugu")
15
 
16
  # Generate speech
17
  wav = tts.synthesize(
 
31
  clear_cache,
32
  get_cache_dir,
33
  create_model_card,
34
+ list_models,
35
  DEFAULT_HF_REPO,
36
+ MODEL_REGISTRY,
37
  )
38
 
39
  __all__ = [
 
43
  "clear_cache",
44
  "get_cache_dir",
45
  "create_model_card",
46
+ "list_models",
47
  "DEFAULT_HF_REPO",
48
+ "MODEL_REGISTRY",
49
  ]
checkpoints/epoch_2nd_00029.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdaefa463728b71e146ad45bac776cefca75781eecbe96ca84c591ece59a46cc
3
+ size 2242832963
configs/config_hindi_english.yml ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ log_dir: "Models/hindi_english_multispeaker_finetuned"
2
+ first_stage_path: "first_stage.pth"
3
+ save_freq: 1
4
+ log_interval: 10
5
+ device: "cuda"
6
+
7
+ epochs_1st: 15
8
+ epochs_2nd: 15
9
+
10
+ batch_size: 2
11
+ max_len: 200
12
+
13
+ pretrained_model: ""
14
+ second_stage_load_pretrained: true
15
+ load_only_params: true
16
+
17
+ F0_path: "Utils/JDC/bst.t7"
18
+ ASR_config: "Utils/ASR/config.yml"
19
+ ASR_path: "Utils/ASR/epoch_00080.pth"
20
+ PLBERT_dir: "Utils/PLBERT/"
21
+
22
+ data_params:
23
+ train_data: ""
24
+ val_data: ""
25
+ root_path: ""
26
+ OOD_data: ""
27
+ min_length: 50
28
+
29
+ # Audio preprocessing (24kHz)
30
+ preprocess_params:
31
+ sr: 24000
32
+ spect_params:
33
+ n_fft: 2048
34
+ win_length: 1200
35
+ hop_length: 300
36
+
37
+ # Model architecture
38
+ model_params:
39
+ multispeaker: true
40
+ num_speakers: 5
41
+
42
+ dim_in: 64
43
+ hidden_dim: 512
44
+ max_conv_dim: 512
45
+ n_layer: 3
46
+ n_mels: 80
47
+ n_token: 178
48
+ max_dur: 50
49
+ style_dim: 128
50
+ dropout: 0.2
51
+
52
+ speaker_embed_dim: 256
53
+
54
+ decoder:
55
+ type: "hifigan"
56
+ resblock_dilation_sizes: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
57
+ resblock_kernel_sizes: [3, 7, 11]
58
+ upsample_initial_channel: 512
59
+ upsample_rates: [10, 5, 3, 2]
60
+ upsample_kernel_sizes: [20, 10, 6, 4]
61
+
62
+ slm:
63
+ model: "microsoft/wavlm-base-plus"
64
+ sr: 16000
65
+ hidden: 768
66
+ nlayers: 13
67
+ initial_channel: 64
68
+
69
+ diffusion:
70
+ embedding_mask_proba: 0.1
71
+ transformer:
72
+ num_layers: 3
73
+ num_heads: 8
74
+ head_features: 64
75
+ multiplier: 2
76
+ dist:
77
+ sigma_data: 0.19926648961191362
78
+ estimate_sigma_data: true
79
+ mean: -3.0
80
+ std: 1.0
81
+
82
+ loss_params:
83
+ lambda_mel: 5.0
84
+ lambda_gen: 1.0
85
+ lambda_slm: 1.0
86
+ lambda_mono: 1.0
87
+ lambda_s2s: 1.0
88
+ lambda_F0: 1.0
89
+ lambda_norm: 1.0
90
+ lambda_dur: 1.0
91
+ lambda_ce: 20.0
92
+ lambda_sty: 1.0
93
+ lambda_diff: 1.0
94
+ TMA_epoch: 2
95
+ diff_epoch: 0
96
+ joint_epoch: 0
97
+
98
+ optimizer_params:
99
+ lr: 0.00005
100
+ bert_lr: 0.000005
101
+ ft_lr: 0.000005
102
+
103
+ slmadv_params:
104
+ min_len: 400
105
+ max_len: 500
106
+ batch_percentage: 0.5
107
+ iter: 20
108
+ thresh: 5
109
+ scale: 0.01
110
+ sig: 1.5
hub.py CHANGED
@@ -5,6 +5,7 @@ Supports:
5
  - HuggingFace Hub integration
6
  - Automatic model downloading
7
  - Local caching
 
8
  """
9
 
10
  import os
@@ -13,15 +14,35 @@ from pathlib import Path
13
  from typing import Optional, Union
14
 
15
  # Default HuggingFace Hub repository
16
- DEFAULT_HF_REPO = "yourusername/chiluka-tts" # TODO: Update with your actual repo
17
 
18
  # Cache directory for downloaded models
19
  CACHE_DIR = Path.home() / ".cache" / "chiluka"
20
 
21
- # Required model files
22
- REQUIRED_FILES = {
23
- "checkpoint": "checkpoints/epoch_2nd_00017.pth",
24
- "config": "configs/config_ft.yml",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  "asr_config": "pretrained/ASR/config.yml",
26
  "asr_model": "pretrained/ASR/epoch_00080.pth",
27
  "f0_model": "pretrained/JDC/bst.t7",
@@ -30,6 +51,27 @@ REQUIRED_FILES = {
30
  }
31
 
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  def get_cache_dir() -> Path:
34
  """Get the cache directory for Chiluka models."""
35
  cache_dir = Path(os.environ.get("CHILUKA_CACHE", CACHE_DIR))
@@ -43,11 +85,19 @@ def is_model_cached(repo_id: str = DEFAULT_HF_REPO) -> bool:
43
  if not cache_path.exists():
44
  return False
45
 
46
- # Check if all required files exist
47
- for file_path in REQUIRED_FILES.values():
48
  if not (cache_path / file_path).exists():
49
  return False
50
- return True
 
 
 
 
 
 
 
 
51
 
52
 
53
  def download_from_hf(
@@ -60,21 +110,16 @@ def download_from_hf(
60
  Download model files from HuggingFace Hub.
61
 
62
  Args:
63
- repo_id: HuggingFace Hub repository ID (e.g., 'username/model-name')
64
  revision: Git revision to download (branch, tag, or commit hash)
65
  force_download: If True, re-download even if cached
66
  token: HuggingFace API token for private repos
67
 
68
  Returns:
69
  Path to the downloaded model directory
70
-
71
- Example:
72
- >>> model_path = download_from_hf("yourusername/chiluka-tts")
73
- >>> print(model_path)
74
- /home/user/.cache/chiluka/yourusername_chiluka-tts
75
  """
76
  try:
77
- from huggingface_hub import snapshot_download, hf_hub_download
78
  except ImportError:
79
  raise ImportError(
80
  "huggingface_hub is required for downloading models. "
@@ -89,7 +134,6 @@ def download_from_hf(
89
 
90
  print(f"Downloading model from HuggingFace Hub: {repo_id}...")
91
 
92
- # Download entire repository
93
  downloaded_path = snapshot_download(
94
  repo_id=repo_id,
95
  revision=revision,
@@ -103,60 +147,32 @@ def download_from_hf(
103
  return Path(downloaded_path)
104
 
105
 
106
- def download_from_url(
107
- url: str,
108
- filename: str,
109
- force_download: bool = False,
110
- ) -> Path:
111
- """
112
- Download a single file from a URL.
113
-
114
- Args:
115
- url: URL to download from
116
- filename: Local filename to save as
117
- force_download: If True, re-download even if exists
118
-
119
- Returns:
120
- Path to the downloaded file
121
- """
122
- import urllib.request
123
-
124
- cache_dir = get_cache_dir() / "downloads"
125
- cache_dir.mkdir(parents=True, exist_ok=True)
126
- local_path = cache_dir / filename
127
-
128
- if local_path.exists() and not force_download:
129
- print(f"Using cached file: {local_path}")
130
- return local_path
131
-
132
- print(f"Downloading {filename}...")
133
-
134
- # Download with progress
135
- def _progress_hook(count, block_size, total_size):
136
- percent = int(count * block_size * 100 / total_size)
137
- print(f"\rDownloading: {percent}%", end="", flush=True)
138
-
139
- urllib.request.urlretrieve(url, local_path, reporthook=_progress_hook)
140
- print() # New line after progress
141
-
142
- return local_path
143
-
144
-
145
- def get_model_paths(repo_id: str = DEFAULT_HF_REPO) -> dict:
146
  """
147
  Get paths to all model files after downloading.
148
 
149
  Args:
 
150
  repo_id: HuggingFace Hub repository ID
151
 
152
  Returns:
153
  Dictionary with paths to config, checkpoint, and pretrained directory
154
  """
 
 
 
 
 
 
155
  model_dir = download_from_hf(repo_id)
 
156
 
157
  return {
158
- "config_path": str(model_dir / "configs" / "config_ft.yml"),
159
- "checkpoint_path": str(model_dir / "checkpoints" / "epoch_2nd_00017.pth"),
160
  "pretrained_dir": str(model_dir / "pretrained"),
161
  }
162
 
@@ -202,7 +218,7 @@ def push_to_hub(
202
  Example:
203
  >>> push_to_hub(
204
  ... local_dir="./chiluka",
205
- ... repo_id="myusername/my-chiluka-model",
206
  ... private=False
207
  ... )
208
  """
@@ -245,6 +261,14 @@ def create_model_card(repo_id: str, save_path: Optional[str] = None) -> str:
245
  Returns:
246
  Model card content as string
247
  """
 
 
 
 
 
 
 
 
248
  model_card = f"""---
249
  language:
250
  - en
@@ -257,12 +281,19 @@ tags:
257
  - tts
258
  - styletts2
259
  - voice-cloning
 
260
  ---
261
 
262
  # Chiluka TTS
263
 
264
  Chiluka (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on StyleTTS2.
265
 
 
 
 
 
 
 
266
  ## Installation
267
 
268
  ```bash
@@ -272,64 +303,47 @@ pip install chiluka
272
  Or install from source:
273
 
274
  ```bash
275
- pip install git+https://github.com/{repo_id.split('/')[0]}/chiluka.git
276
  ```
277
 
278
  ## Usage
279
 
280
- ### Quick Start (Auto-download)
281
 
282
  ```python
283
  from chiluka import Chiluka
284
 
285
- # Automatically downloads model weights
286
  tts = Chiluka.from_pretrained()
287
 
288
- # Generate speech
289
  wav = tts.synthesize(
290
  text="Hello, world!",
291
- reference_audio="path/to/reference.wav",
292
  language="en"
293
  )
294
-
295
- # Save output
296
  tts.save_wav(wav, "output.wav")
297
  ```
298
 
299
- ### PyTorch Hub
300
 
301
  ```python
302
- import torch
303
 
304
- tts = torch.hub.load('{repo_id.split('/')[0]}/chiluka', 'chiluka')
305
- wav = tts.synthesize("Hello!", "reference.wav", language="en")
 
 
 
306
  ```
307
 
308
- ### HuggingFace Hub
309
 
310
  ```python
311
- from chiluka import Chiluka
312
 
313
- tts = Chiluka.from_pretrained("{repo_id}")
 
314
  ```
315
 
316
- ## Parameters
317
-
318
- - `text`: Input text to synthesize
319
- - `reference_audio`: Path to reference audio for style transfer
320
- - `language`: Language code ('en', 'te', 'hi', etc.)
321
- - `alpha`: Acoustic style mixing (0-1, default 0.3)
322
- - `beta`: Prosodic style mixing (0-1, default 0.7)
323
- - `diffusion_steps`: Quality vs speed tradeoff (default 5)
324
-
325
- ## Supported Languages
326
-
327
- Uses espeak-ng phonemizer. Common languages:
328
- - English: `en-us`, `en-gb`
329
- - Telugu: `te`
330
- - Hindi: `hi`
331
- - Tamil: `ta`
332
-
333
  ## License
334
 
335
  MIT License
 
5
  - HuggingFace Hub integration
6
  - Automatic model downloading
7
  - Local caching
8
+ - Multiple model variants
9
  """
10
 
11
  import os
 
14
  from typing import Optional, Union
15
 
16
  # Default HuggingFace Hub repository
17
+ DEFAULT_HF_REPO = "Seemanth/chiluka-tts"
18
 
19
  # Cache directory for downloaded models
20
  CACHE_DIR = Path.home() / ".cache" / "chiluka"
21
 
22
+ # ============================================
23
+ # Model Registry
24
+ # ============================================
25
+ # Maps model names to their config + checkpoint paths
26
+ # relative to the repo root.
27
+ MODEL_REGISTRY = {
28
+ "telugu": {
29
+ "config": "configs/config_ft.yml",
30
+ "checkpoint": "checkpoints/epoch_2nd_00017.pth",
31
+ "languages": ["te", "en"],
32
+ "description": "Telugu + English single-speaker TTS",
33
+ },
34
+ "hindi_english": {
35
+ "config": "configs/config_hindi_english.yml",
36
+ "checkpoint": "checkpoints/epoch_2nd_00029.pth",
37
+ "languages": ["hi", "en"],
38
+ "description": "Hindi + English multi-speaker TTS (5 speakers)",
39
+ },
40
+ }
41
+
42
+ DEFAULT_MODEL = "hindi_english"
43
+
44
+ # Shared pretrained sub-models (same across all variants)
45
+ PRETRAINED_FILES = {
46
  "asr_config": "pretrained/ASR/config.yml",
47
  "asr_model": "pretrained/ASR/epoch_00080.pth",
48
  "f0_model": "pretrained/JDC/bst.t7",
 
51
  }
52
 
53
 
54
+ def list_models() -> dict:
55
+ """
56
+ List all available model variants.
57
+
58
+ Returns:
59
+ Dictionary of model names and their info.
60
+
61
+ Example:
62
+ >>> from chiluka import hub
63
+ >>> hub.list_models()
64
+ {'telugu': {...}, 'hindi_english': {...}}
65
+ """
66
+ return {
67
+ name: {
68
+ "languages": info["languages"],
69
+ "description": info["description"],
70
+ }
71
+ for name, info in MODEL_REGISTRY.items()
72
+ }
73
+
74
+
75
  def get_cache_dir() -> Path:
76
  """Get the cache directory for Chiluka models."""
77
  cache_dir = Path(os.environ.get("CHILUKA_CACHE", CACHE_DIR))
 
85
  if not cache_path.exists():
86
  return False
87
 
88
+ # Check if shared pretrained files exist
89
+ for file_path in PRETRAINED_FILES.values():
90
  if not (cache_path / file_path).exists():
91
  return False
92
+
93
+ # Check if at least one model variant exists
94
+ for model_info in MODEL_REGISTRY.values():
95
+ config_exists = (cache_path / model_info["config"]).exists()
96
+ checkpoint_exists = (cache_path / model_info["checkpoint"]).exists()
97
+ if config_exists and checkpoint_exists:
98
+ return True
99
+
100
+ return False
101
 
102
 
103
  def download_from_hf(
 
110
  Download model files from HuggingFace Hub.
111
 
112
  Args:
113
+ repo_id: HuggingFace Hub repository ID (e.g., 'Seemanth/chiluka-tts')
114
  revision: Git revision to download (branch, tag, or commit hash)
115
  force_download: If True, re-download even if cached
116
  token: HuggingFace API token for private repos
117
 
118
  Returns:
119
  Path to the downloaded model directory
 
 
 
 
 
120
  """
121
  try:
122
+ from huggingface_hub import snapshot_download
123
  except ImportError:
124
  raise ImportError(
125
  "huggingface_hub is required for downloading models. "
 
134
 
135
  print(f"Downloading model from HuggingFace Hub: {repo_id}...")
136
 
 
137
  downloaded_path = snapshot_download(
138
  repo_id=repo_id,
139
  revision=revision,
 
147
  return Path(downloaded_path)
148
 
149
 
150
+ def get_model_paths(
151
+ model: str = DEFAULT_MODEL,
152
+ repo_id: str = DEFAULT_HF_REPO,
153
+ ) -> dict:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  """
155
  Get paths to all model files after downloading.
156
 
157
  Args:
158
+ model: Model variant name ('telugu', 'hindi_english')
159
  repo_id: HuggingFace Hub repository ID
160
 
161
  Returns:
162
  Dictionary with paths to config, checkpoint, and pretrained directory
163
  """
164
+ if model not in MODEL_REGISTRY:
165
+ available = ", ".join(MODEL_REGISTRY.keys())
166
+ raise ValueError(
167
+ f"Unknown model '{model}'. Available models: {available}"
168
+ )
169
+
170
  model_dir = download_from_hf(repo_id)
171
+ model_info = MODEL_REGISTRY[model]
172
 
173
  return {
174
+ "config_path": str(model_dir / model_info["config"]),
175
+ "checkpoint_path": str(model_dir / model_info["checkpoint"]),
176
  "pretrained_dir": str(model_dir / "pretrained"),
177
  }
178
 
 
218
  Example:
219
  >>> push_to_hub(
220
  ... local_dir="./chiluka",
221
+ ... repo_id="Seemanth/chiluka-tts",
222
  ... private=False
223
  ... )
224
  """
 
261
  Returns:
262
  Model card content as string
263
  """
264
+ owner = repo_id.split("/")[0]
265
+
266
+ # Build model table
267
+ model_rows = ""
268
+ for name, info in MODEL_REGISTRY.items():
269
+ langs = ", ".join(info["languages"])
270
+ model_rows += f"| `{name}` | {info['description']} | {langs} |\n"
271
+
272
  model_card = f"""---
273
  language:
274
  - en
 
281
  - tts
282
  - styletts2
283
  - voice-cloning
284
+ - multi-language
285
  ---
286
 
287
  # Chiluka TTS
288
 
289
  Chiluka (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on StyleTTS2.
290
 
291
+ ## Available Models
292
+
293
+ | Model | Description | Languages |
294
+ |-------|-------------|-----------|
295
+ {model_rows}
296
+
297
  ## Installation
298
 
299
  ```bash
 
303
  Or install from source:
304
 
305
  ```bash
306
+ pip install git+https://github.com/{owner}/chiluka.git
307
  ```
308
 
309
  ## Usage
310
 
311
+ ### Hindi + English (default)
312
 
313
  ```python
314
  from chiluka import Chiluka
315
 
 
316
  tts = Chiluka.from_pretrained()
317
 
 
318
  wav = tts.synthesize(
319
  text="Hello, world!",
320
+ reference_audio="reference.wav",
321
  language="en"
322
  )
 
 
323
  tts.save_wav(wav, "output.wav")
324
  ```
325
 
326
+ ### Telugu
327
 
328
  ```python
329
+ tts = Chiluka.from_pretrained(model="telugu")
330
 
331
+ wav = tts.synthesize(
332
+ text="నమస్కారం",
333
+ reference_audio="reference.wav",
334
+ language="te"
335
+ )
336
  ```
337
 
338
+ ### PyTorch Hub
339
 
340
  ```python
341
+ import torch
342
 
343
+ tts = torch.hub.load('{owner}/chiluka', 'chiluka')
344
+ tts = torch.hub.load('{owner}/chiluka', 'chiluka_telugu')
345
  ```
346
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
347
  ## License
348
 
349
  MIT License
inference.py CHANGED
@@ -155,6 +155,7 @@ class Chiluka:
155
  @classmethod
156
  def from_pretrained(
157
  cls,
 
158
  repo_id: str = None,
159
  device: Optional[str] = None,
160
  force_download: bool = False,
@@ -168,7 +169,10 @@ class Chiluka:
168
  Weights are automatically downloaded and cached on first use.
169
 
170
  Args:
171
- repo_id: HuggingFace Hub repository ID (e.g., 'username/chiluka-tts').
 
 
 
172
  If None, uses the default repository.
173
  device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
174
  force_download: If True, re-download even if cached.
@@ -179,31 +183,32 @@ class Chiluka:
179
  Initialized Chiluka TTS model ready for inference.
180
 
181
  Examples:
182
- # Default repository (auto-download)
183
  >>> tts = Chiluka.from_pretrained()
184
 
185
- # Specific repository
186
- >>> tts = Chiluka.from_pretrained("myuser/my-chiluka-model")
 
 
 
187
 
188
  # Force re-download
189
  >>> tts = Chiluka.from_pretrained(force_download=True)
190
-
191
- # Private repository
192
- >>> tts = Chiluka.from_pretrained("myuser/private-model", token="hf_xxx")
193
  """
194
- from .hub import download_from_hf, get_model_paths, DEFAULT_HF_REPO
195
 
 
196
  repo_id = repo_id or DEFAULT_HF_REPO
197
 
198
  # Download model files (or use cache)
199
- model_dir = download_from_hf(
200
  repo_id=repo_id,
201
  force_download=force_download,
202
  token=token,
203
  )
204
 
205
- # Get paths to model files
206
- paths = get_model_paths(repo_id)
207
 
208
  return cls(
209
  config_path=paths["config_path"],
 
155
  @classmethod
156
  def from_pretrained(
157
  cls,
158
+ model: str = None,
159
  repo_id: str = None,
160
  device: Optional[str] = None,
161
  force_download: bool = False,
 
169
  Weights are automatically downloaded and cached on first use.
170
 
171
  Args:
172
+ model: Model variant to load. Options:
173
+ - 'hindi_english' (default) - Hindi + English multi-speaker TTS
174
+ - 'telugu' - Telugu + English single-speaker TTS
175
+ repo_id: HuggingFace Hub repository ID (e.g., 'Seemanth/chiluka-tts').
176
  If None, uses the default repository.
177
  device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
178
  force_download: If True, re-download even if cached.
 
183
  Initialized Chiluka TTS model ready for inference.
184
 
185
  Examples:
186
+ # Hindi-English model (default)
187
  >>> tts = Chiluka.from_pretrained()
188
 
189
+ # Telugu model
190
+ >>> tts = Chiluka.from_pretrained(model="telugu")
191
+
192
+ # Specific HuggingFace repository
193
+ >>> tts = Chiluka.from_pretrained(repo_id="myuser/my-model")
194
 
195
  # Force re-download
196
  >>> tts = Chiluka.from_pretrained(force_download=True)
 
 
 
197
  """
198
+ from .hub import download_from_hf, get_model_paths, DEFAULT_HF_REPO, DEFAULT_MODEL
199
 
200
+ model = model or DEFAULT_MODEL
201
  repo_id = repo_id or DEFAULT_HF_REPO
202
 
203
  # Download model files (or use cache)
204
+ download_from_hf(
205
  repo_id=repo_id,
206
  force_download=force_download,
207
  token=token,
208
  )
209
 
210
+ # Get paths to model files for the selected variant
211
+ paths = get_model_paths(model=model, repo_id=repo_id)
212
 
213
  return cls(
214
  config_path=paths["config_path"],
models/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (425 Bytes). View file
 
models/__pycache__/__init__.cpython-311.pyc ADDED
Binary file (544 Bytes). View file
 
models/__pycache__/__init__.cpython-313.pyc ADDED
Binary file (450 Bytes). View file
 
models/__pycache__/core.cpython-310.pyc ADDED
Binary file (27.9 kB). View file
 
models/__pycache__/core.cpython-311.pyc ADDED
Binary file (60.9 kB). View file
 
models/__pycache__/core.cpython-313.pyc ADDED
Binary file (55.2 kB). View file
 
models/__pycache__/hifigan.cpython-310.pyc ADDED
Binary file (11.2 kB). View file
 
models/__pycache__/hifigan.cpython-311.pyc ADDED
Binary file (25 kB). View file
 
models/__pycache__/hifigan.cpython-313.pyc ADDED
Binary file (22 kB). View file
 
models/diffusion/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (568 Bytes). View file
 
models/diffusion/__pycache__/__init__.cpython-311.pyc ADDED
Binary file (707 Bytes). View file
 
models/diffusion/__pycache__/__init__.cpython-313.pyc ADDED
Binary file (592 Bytes). View file
 
models/diffusion/__pycache__/diffusion.cpython-310.pyc ADDED
Binary file (3.47 kB). View file
 
models/diffusion/__pycache__/diffusion.cpython-311.pyc ADDED
Binary file (5.14 kB). View file
 
models/diffusion/__pycache__/diffusion.cpython-313.pyc ADDED
Binary file (4.56 kB). View file
 
models/diffusion/__pycache__/modules.cpython-310.pyc ADDED
Binary file (14.5 kB). View file
 
models/diffusion/__pycache__/modules.cpython-311.pyc ADDED
Binary file (29.8 kB). View file
 
models/diffusion/__pycache__/modules.cpython-313.pyc ADDED
Binary file (26 kB). View file
 
models/diffusion/__pycache__/sampler.cpython-310.pyc ADDED
Binary file (9.14 kB). View file
 
models/diffusion/__pycache__/sampler.cpython-311.pyc ADDED
Binary file (15 kB). View file
 
models/diffusion/__pycache__/sampler.cpython-313.pyc ADDED
Binary file (13.7 kB). View file
 
models/diffusion/__pycache__/utils.cpython-310.pyc ADDED
Binary file (1.98 kB). View file
 
models/diffusion/__pycache__/utils.cpython-311.pyc ADDED
Binary file (3.43 kB). View file
 
models/diffusion/__pycache__/utils.cpython-313.pyc ADDED
Binary file (2.72 kB). View file
 
pretrained/ASR/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (150 Bytes). View file
 
pretrained/ASR/__pycache__/layers.cpython-310.pyc ADDED
Binary file (11 kB). View file
 
pretrained/ASR/__pycache__/models.cpython-310.pyc ADDED
Binary file (6.12 kB). View file
 
pretrained/JDC/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (150 Bytes). View file
 
pretrained/JDC/__pycache__/model.cpython-310.pyc ADDED
Binary file (4.78 kB). View file
 
pretrained/PLBERT/__pycache__/util.cpython-310.pyc ADDED
Binary file (1.75 kB). View file