seemanthraju Claude Opus 4.5 commited on
Commit
10ea2f8
·
1 Parent(s): 60fee7c

Add Hindi-English model, multi-model support, and example scripts

Browse files

- Add Hindi-English multi-speaker TTS model (5 speakers)
- Add model registry in hub.py for selecting model variants
- Update from_pretrained() to accept model="hindi_english" or model="telugu"
- Add torch.hub entry points: chiluka, chiluka_telugu, chiluka_hindi_english
- Add example scripts for HuggingFace Hub, PyTorch Hub, and pip usage
- Add HuggingFace model card (MODEL_CARD.md)
- Update README with all models and loading methods
- Exclude large weights from PyPI package via MANIFEST.in

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

MANIFEST.in ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Include config files
2
+ include chiluka/configs/*.yml
3
+
4
+ # Include pretrained config files (but NOT weights)
5
+ include chiluka/pretrained/ASR/config.yml
6
+ include chiluka/pretrained/PLBERT/config.yml
7
+
8
+ # Exclude large model weights (these come from HuggingFace Hub)
9
+ exclude chiluka/checkpoints/*.pth
10
+ exclude chiluka/pretrained/ASR/*.pth
11
+ exclude chiluka/pretrained/JDC/*.t7
12
+ exclude chiluka/pretrained/PLBERT/*.t7
13
+
14
+ # Exclude other unnecessary files
15
+ global-exclude *.pyc
16
+ global-exclude __pycache__
17
+ global-exclude *.egg-info
MODEL_CARD.md ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - hi
5
+ - te
6
+ license: mit
7
+ library_name: chiluka
8
+ pipeline_tag: text-to-speech
9
+ tags:
10
+ - text-to-speech
11
+ - tts
12
+ - styletts2
13
+ - voice-cloning
14
+ - multi-language
15
+ - hindi
16
+ - english
17
+ - telugu
18
+ - multi-speaker
19
+ - style-transfer
20
+ ---
21
+
22
+ # Chiluka TTS
23
+
24
+ **Chiluka** (చిలుక - Telugu for "parrot") is a lightweight, self-contained Text-to-Speech inference package based on [StyleTTS2](https://github.com/yl4579/StyleTTS2).
25
+
26
+ It supports **style transfer from reference audio** - give it a voice sample and it will speak in that style.
27
+
28
+ ## Available Models
29
+
30
+ | Model | Name | Languages | Speakers | Description |
31
+ |-------|------|-----------|----------|-------------|
32
+ | **Hindi-English** (default) | `hindi_english` | Hindi, English | 5 | Multi-speaker Hindi + English TTS |
33
+ | **Telugu** | `telugu` | Telugu, English | 1 | Single-speaker Telugu + English TTS |
34
+
35
+ ## Installation
36
+
37
+ ```bash
38
+ pip install chiluka
39
+ ```
40
+
41
+ Or from GitHub:
42
+
43
+ ```bash
44
+ pip install git+https://github.com/PurviewVoiceBot/chiluka.git
45
+ ```
46
+
47
+ **System dependency** (required for phonemization):
48
+
49
+ ```bash
50
+ # Ubuntu/Debian
51
+ sudo apt-get install espeak-ng
52
+
53
+ # macOS
54
+ brew install espeak-ng
55
+ ```
56
+
57
+ ## Quick Start
58
+
59
+ ```python
60
+ from chiluka import Chiluka
61
+
62
+ # Load model (weights download automatically on first use)
63
+ tts = Chiluka.from_pretrained()
64
+
65
+ # Synthesize speech
66
+ wav = tts.synthesize(
67
+ text="Hello, this is Chiluka speaking!",
68
+ reference_audio="path/to/reference.wav",
69
+ language="en"
70
+ )
71
+
72
+ # Save output
73
+ tts.save_wav(wav, "output.wav")
74
+ ```
75
+
76
+ ## Choose a Model
77
+
78
+ ```python
79
+ from chiluka import Chiluka
80
+
81
+ # Hindi + English (default)
82
+ tts = Chiluka.from_pretrained(model="hindi_english")
83
+
84
+ # Telugu + English
85
+ tts = Chiluka.from_pretrained(model="telugu")
86
+ ```
87
+
88
+ ## Hindi Example
89
+
90
+ ```python
91
+ tts = Chiluka.from_pretrained()
92
+
93
+ wav = tts.synthesize(
94
+ text="नमस्ते, मैं चिलुका बोल रहा हूं",
95
+ reference_audio="reference.wav",
96
+ language="hi"
97
+ )
98
+ tts.save_wav(wav, "hindi_output.wav")
99
+ ```
100
+
101
+ ## Telugu Example
102
+
103
+ ```python
104
+ tts = Chiluka.from_pretrained(model="telugu")
105
+
106
+ wav = tts.synthesize(
107
+ text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
108
+ reference_audio="reference.wav",
109
+ language="te"
110
+ )
111
+ tts.save_wav(wav, "telugu_output.wav")
112
+ ```
113
+
114
+ ## PyTorch Hub
115
+
116
+ ```python
117
+ import torch
118
+
119
+ # Hindi-English (default)
120
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
121
+
122
+ # Telugu
123
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
124
+
125
+ wav = tts.synthesize("Hello!", "reference.wav", language="en")
126
+ ```
127
+
128
+ ## Synthesis Parameters
129
+
130
+ | Parameter | Default | Description |
131
+ |-----------|---------|-------------|
132
+ | `text` | required | Input text to synthesize |
133
+ | `reference_audio` | required | Path to reference audio for voice style |
134
+ | `language` | `"en"` | Language code (`en`, `hi`, `te`, etc.) |
135
+ | `alpha` | `0.3` | Acoustic style mixing (0 = reference voice, 1 = predicted) |
136
+ | `beta` | `0.7` | Prosodic style mixing (0 = reference prosody, 1 = predicted) |
137
+ | `diffusion_steps` | `5` | More steps = better quality, slower inference |
138
+ | `embedding_scale` | `1.0` | Classifier-free guidance strength |
139
+
140
+ ## How It Works
141
+
142
+ Chiluka uses a StyleTTS2-based pipeline:
143
+
144
+ 1. **Text** is converted to phonemes using espeak-ng
145
+ 2. **PL-BERT** encodes text into contextual embeddings
146
+ 3. **Reference audio** is processed to extract a style vector
147
+ 4. **Diffusion model** samples a style conditioned on text
148
+ 5. **Prosody predictor** generates duration, pitch (F0), and energy
149
+ 6. **HiFi-GAN decoder** synthesizes the final waveform at 24kHz
150
+
151
+ ## Model Architecture
152
+
153
+ - **Text Encoder**: Token embedding + CNN + BiLSTM
154
+ - **Style Encoder**: Conv2D + Residual blocks (style_dim=128)
155
+ - **Prosody Predictor**: LSTM-based with AdaIN normalization
156
+ - **Diffusion Model**: Transformer-based denoiser with ADPM2 sampler
157
+ - **Decoder**: HiFi-GAN vocoder (upsample rates: 10, 5, 3, 2)
158
+ - **Pretrained sub-models**: PL-BERT (text), ASR (alignment), JDC (pitch)
159
+
160
+ ## File Structure
161
+
162
+ ```
163
+ ├── configs/
164
+ │ ├── config_ft.yml # Telugu model config
165
+ │ └── config_hindi_english.yml # Hindi-English model config
166
+ ├── checkpoints/
167
+ │ ├── epoch_2nd_00017.pth # Telugu checkpoint (~2GB)
168
+ │ └── epoch_2nd_00029.pth # Hindi-English checkpoint (~2GB)
169
+ ├── pretrained/ # Shared pretrained sub-models
170
+ │ ├── ASR/ # Text-to-mel alignment
171
+ │ ├── JDC/ # Pitch extraction (F0)
172
+ │ └── PLBERT/ # Text encoder
173
+ ├── models/ # Model architecture code
174
+ │ ├── core.py
175
+ │ ├── hifigan.py
176
+ │ └── diffusion/
177
+ ├── inference.py # Main API
178
+ ├── hub.py # HuggingFace Hub utilities
179
+ └── text_utils.py # Phoneme tokenization
180
+ ```
181
+
182
+ ## Requirements
183
+
184
+ - Python >= 3.8
185
+ - PyTorch >= 1.13.0
186
+ - CUDA recommended (works on CPU too)
187
+ - espeak-ng system package
188
+
189
+ ## Limitations
190
+
191
+ - Requires a reference audio file for style/voice transfer
192
+ - Quality depends on the reference audio quality
193
+ - Best results with 3-15 second reference clips
194
+ - Hindi-English model trained on 5 speakers
195
+ - Telugu model trained on 1 speaker
196
+
197
+ ## Citation
198
+
199
+ Based on StyleTTS2:
200
+
201
+ ```bibtex
202
+ @inproceedings{li2024styletts,
203
+ title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models},
204
+ author={Li, Yinghao Aaron and Han, Cong and Raber, Vinay S and Mesgarani, Nima},
205
+ booktitle={NeurIPS},
206
+ year={2024}
207
+ }
208
+ ```
209
+
210
+ ## License
211
+
212
+ MIT License
213
+
214
+ ## Links
215
+
216
+ - **GitHub**: [PurviewVoiceBot/chiluka](https://github.com/PurviewVoiceBot/chiluka)
217
+ - **PyPI**: [chiluka](https://pypi.org/project/chiluka/)
README.md CHANGED
@@ -1,59 +1,65 @@
1
- # Chiluka 🦜
2
 
3
  **Chiluka** (చిలుక - Telugu for "parrot") is a self-contained TTS (Text-to-Speech) inference package based on StyleTTS2.
4
 
5
  ## Features
6
 
7
- - 🚀 Simple, clean API for TTS synthesis
8
- - 📦 **Fully self-contained** - all models bundled in the package
9
- - 🎙️ Style transfer from reference audio
10
- - 🌍 Multi-language support via phonemizer
11
- - 🔧 No external dependencies on other repos
 
 
 
 
 
 
 
12
 
13
  ## Installation
14
 
15
- ### From Source (Recommended)
16
 
17
  ```bash
18
- git clone https://github.com/yourusername/chiluka.git
19
- cd chiluka
20
- pip install -e .
21
  ```
22
 
23
- **Note:** This repo uses Git LFS for large model files. Make sure to install Git LFS first:
24
 
25
  ```bash
26
- # Ubuntu/Debian
27
- sudo apt-get install git-lfs
28
- git lfs install
29
 
30
- # macOS
31
- brew install git-lfs
32
- git lfs install
33
 
34
- # Then clone
35
- git lfs clone https://github.com/yourusername/chiluka.git
 
 
36
  ```
37
 
38
- ### Install espeak-ng (Required for phonemization)
39
 
40
- **Ubuntu/Debian:**
41
  ```bash
 
42
  sudo apt-get install espeak-ng
43
- ```
44
 
45
- **macOS:**
46
- ```bash
47
  brew install espeak-ng
48
  ```
49
 
50
  ## Quick Start
51
 
 
 
 
 
52
  ```python
53
  from chiluka import Chiluka
54
 
55
- # Initialize - uses bundled models automatically!
56
- tts = Chiluka()
57
 
58
  # Synthesize speech
59
  wav = tts.synthesize(
@@ -66,61 +72,123 @@ wav = tts.synthesize(
66
  tts.save_wav(wav, "output.wav")
67
  ```
68
 
69
- ### Telugu Example
70
 
71
  ```python
72
  from chiluka import Chiluka
73
 
74
- tts = Chiluka()
 
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  wav = tts.synthesize(
77
- text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
78
- reference_audio="path/to/telugu_reference.wav",
79
- language="te" # Telugu
80
  )
 
81
 
82
- tts.save_wav(wav, "telugu_output.wav")
 
 
 
 
 
83
  ```
84
 
85
- ## Package Structure
 
 
86
 
 
 
 
 
 
 
 
 
 
 
 
87
  ```
88
- chiluka/
89
- ├── chiluka/
90
- │ ├── __init__.py
91
- │ ├── inference.py # Main Chiluka API
92
- │ ├── text_utils.py
93
- │ ├── utils.py
94
- │ ├── configs/
95
- │ │ └── config_ft.yml # Model configuration
96
- │ ├── checkpoints/
97
- │ │ └── *.pth # Trained model checkpoint
98
- │ ├── pretrained/
99
- │ │ ├── ASR/ # Text aligner model
100
- │ │ ├── JDC/ # Pitch extractor model
101
- │ │ └── PLBERT/ # PL-BERT model
102
- │ └── models/
103
- │ ├── core.py
104
- │ ├── hifigan.py
105
- │ └── diffusion/
106
- ├── examples/
107
- │ ├── basic_synthesis.py
108
- │ └── telugu_synthesis.py
109
- ├── setup.py
110
- ├── pyproject.toml
111
- └── README.md
 
 
 
 
 
 
 
 
 
 
 
112
  ```
113
 
114
  ## API Reference
115
 
116
- ### Chiluka Class
117
 
118
  ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
  tts = Chiluka(
120
- config_path=None, # Optional: custom config file
121
- checkpoint_path=None, # Optional: custom checkpoint
122
- pretrained_dir=None, # Optional: custom pretrained models
123
- device=None # Optional: 'cuda' or 'cpu'
124
  )
125
  ```
126
 
@@ -130,11 +198,11 @@ tts = Chiluka(
130
  wav = tts.synthesize(
131
  text="Hello world", # Text to synthesize
132
  reference_audio="ref.wav", # Reference audio for style
133
- language="en", # Language code ('en', 'te', 'hi', etc.)
134
  alpha=0.3, # Acoustic style mixing (0-1)
135
  beta=0.7, # Prosodic style mixing (0-1)
136
- diffusion_steps=5, # Diffusion sampling steps
137
- embedding_scale=1.0, # Classifier-free guidance scale
138
  sr=24000 # Sample rate
139
  )
140
  ```
@@ -158,23 +226,51 @@ style = tts.compute_style("reference.wav", sr=24000)
158
  |-----------|---------|-------------|
159
  | `alpha` | 0.3 | Acoustic style mixing (0=reference only, 1=predicted only) |
160
  | `beta` | 0.7 | Prosodic style mixing (0=reference only, 1=predicted only) |
161
- | `diffusion_steps` | 5 | Number of diffusion sampling steps (more = better quality, slower) |
162
  | `embedding_scale` | 1.0 | Classifier-free guidance scale |
163
 
164
  ## Supported Languages
165
 
166
- Uses [phonemizer](https://github.com/bootphon/phonemizer) with espeak-ng. Common languages:
 
 
 
 
 
 
 
 
 
167
 
168
- | Language | Code |
169
- |----------|------|
170
- | English (US) | `en-us` |
171
- | English (UK) | `en-gb` |
172
- | Telugu | `te` |
173
- | Hindi | `hi` |
174
- | Tamil | `ta` |
175
- | Kannada | `kn` |
176
 
177
- See espeak-ng documentation for full list.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
  ## Requirements
180
 
@@ -183,11 +279,46 @@ See espeak-ng documentation for full list.
183
  - CUDA (recommended for faster inference)
184
  - espeak-ng
185
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  ## Training Your Own Model
187
 
188
  This package is for **inference only**. To train your own model, use the original [StyleTTS2](https://github.com/yl4579/StyleTTS2) repository.
189
 
190
- After training, copy your checkpoint to `chiluka/checkpoints/` and update the config if needed.
 
 
 
191
 
192
  ## Credits
193
 
 
1
+ # Chiluka
2
 
3
  **Chiluka** (చిలుక - Telugu for "parrot") is a self-contained TTS (Text-to-Speech) inference package based on StyleTTS2.
4
 
5
  ## Features
6
 
7
+ - Simple, clean API for TTS synthesis
8
+ - Style transfer from reference audio
9
+ - Multi-language support via phonemizer
10
+ - **Multiple models** - Hindi-English and Telugu
11
+ - **Multiple ways to load** - HuggingFace Hub, PyTorch Hub, pip install
12
+
13
+ ## Available Models
14
+
15
+ | Model | Name | Languages | Speakers | Description |
16
+ |-------|------|-----------|----------|-------------|
17
+ | Hindi-English (default) | `hindi_english` | Hindi, English | 5 | Multi-speaker Hindi + English TTS |
18
+ | Telugu | `telugu` | Telugu, English | 1 | Single-speaker Telugu + English TTS |
19
 
20
  ## Installation
21
 
22
+ ### Option 1: pip install
23
 
24
  ```bash
25
+ pip install chiluka
 
 
26
  ```
27
 
28
+ ### Option 2: Install from GitHub
29
 
30
  ```bash
31
+ pip install git+https://github.com/PurviewVoiceBot/chiluka.git
32
+ ```
 
33
 
34
+ ### Option 3: From Source
 
 
35
 
36
+ ```bash
37
+ git clone https://github.com/PurviewVoiceBot/chiluka.git
38
+ cd chiluka
39
+ pip install -e .
40
  ```
41
 
42
+ ### System Dependency: espeak-ng (Required)
43
 
 
44
  ```bash
45
+ # Ubuntu/Debian
46
  sudo apt-get install espeak-ng
 
47
 
48
+ # macOS
 
49
  brew install espeak-ng
50
  ```
51
 
52
  ## Quick Start
53
 
54
+ ### HuggingFace Hub (Recommended)
55
+
56
+ Model weights download automatically on first use. No cloning needed.
57
+
58
  ```python
59
  from chiluka import Chiluka
60
 
61
+ # Load Hindi-English model (default)
62
+ tts = Chiluka.from_pretrained()
63
 
64
  # Synthesize speech
65
  wav = tts.synthesize(
 
72
  tts.save_wav(wav, "output.wav")
73
  ```
74
 
75
+ ### Load a Specific Model
76
 
77
  ```python
78
  from chiluka import Chiluka
79
 
80
+ # Hindi-English (default)
81
+ tts = Chiluka.from_pretrained(model="hindi_english")
82
 
83
+ # Telugu
84
+ tts = Chiluka.from_pretrained(model="telugu")
85
+ ```
86
+
87
+ ### PyTorch Hub
88
+
89
+ ```python
90
+ import torch
91
+
92
+ # Hindi-English (default)
93
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
94
+
95
+ # Telugu
96
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
97
+
98
+ # Synthesize
99
  wav = tts.synthesize(
100
+ text="Hello from PyTorch Hub!",
101
+ reference_audio="reference.wav",
102
+ language="en"
103
  )
104
+ ```
105
 
106
+ ### Local Weights (if you cloned with Git LFS)
107
+
108
+ ```python
109
+ from chiluka import Chiluka
110
+
111
+ tts = Chiluka() # uses bundled weights from cloned repo
112
  ```
113
 
114
+ ## Examples
115
+
116
+ ### Hindi Synthesis
117
 
118
+ ```python
119
+ from chiluka import Chiluka
120
+
121
+ tts = Chiluka.from_pretrained(model="hindi_english")
122
+
123
+ wav = tts.synthesize(
124
+ text="नमस्ते, मैं चिलुका बोल रहा हूं",
125
+ reference_audio="hindi_reference.wav",
126
+ language="hi"
127
+ )
128
+ tts.save_wav(wav, "hindi_output.wav")
129
  ```
130
+
131
+ ### English Synthesis
132
+
133
+ ```python
134
+ wav = tts.synthesize(
135
+ text="Hello, I am Chiluka, a text to speech system.",
136
+ reference_audio="english_reference.wav",
137
+ language="en"
138
+ )
139
+ tts.save_wav(wav, "english_output.wav")
140
+ ```
141
+
142
+ ### Telugu Synthesis
143
+
144
+ ```python
145
+ from chiluka import Chiluka
146
+
147
+ tts = Chiluka.from_pretrained(model="telugu")
148
+
149
+ wav = tts.synthesize(
150
+ text="నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
151
+ reference_audio="telugu_reference.wav",
152
+ language="te"
153
+ )
154
+ tts.save_wav(wav, "telugu_output.wav")
155
+ ```
156
+
157
+ ### List Available Models
158
+
159
+ ```python
160
+ from chiluka import list_models
161
+
162
+ models = list_models()
163
+ for name, info in models.items():
164
+ print(f"{name}: {info['description']} ({', '.join(info['languages'])})")
165
  ```
166
 
167
  ## API Reference
168
 
169
+ ### Loading the Model
170
 
171
  ```python
172
+ # Auto-download from HuggingFace (recommended)
173
+ tts = Chiluka.from_pretrained() # Hindi-English (default)
174
+ tts = Chiluka.from_pretrained(model="telugu") # Telugu
175
+ tts = Chiluka.from_pretrained(model="hindi_english") # Hindi-English (explicit)
176
+
177
+ # With options
178
+ tts = Chiluka.from_pretrained(
179
+ model="hindi_english", # Model variant
180
+ repo_id="Seemanth/chiluka-tts", # HuggingFace repo
181
+ device="cuda", # or "cpu"
182
+ force_download=False, # Re-download even if cached
183
+ token="hf_xxx" # For private repos
184
+ )
185
+
186
+ # Local weights
187
  tts = Chiluka(
188
+ config_path="path/to/config.yml",
189
+ checkpoint_path="path/to/model.pth",
190
+ pretrained_dir="path/to/pretrained/",
191
+ device="cuda"
192
  )
193
  ```
194
 
 
198
  wav = tts.synthesize(
199
  text="Hello world", # Text to synthesize
200
  reference_audio="ref.wav", # Reference audio for style
201
+ language="en", # Language code
202
  alpha=0.3, # Acoustic style mixing (0-1)
203
  beta=0.7, # Prosodic style mixing (0-1)
204
+ diffusion_steps=5, # Quality vs speed tradeoff
205
+ embedding_scale=1.0, # Classifier-free guidance
206
  sr=24000 # Sample rate
207
  )
208
  ```
 
226
  |-----------|---------|-------------|
227
  | `alpha` | 0.3 | Acoustic style mixing (0=reference only, 1=predicted only) |
228
  | `beta` | 0.7 | Prosodic style mixing (0=reference only, 1=predicted only) |
229
+ | `diffusion_steps` | 5 | Diffusion sampling steps (more = better quality, slower) |
230
  | `embedding_scale` | 1.0 | Classifier-free guidance scale |
231
 
232
  ## Supported Languages
233
 
234
+ Uses [phonemizer](https://github.com/bootphon/phonemizer) with espeak-ng:
235
+
236
+ | Language | Code | Available In |
237
+ |----------|------|-------------|
238
+ | English (US) | `en-us` | All models |
239
+ | English (UK) | `en-gb` | All models |
240
+ | Hindi | `hi` | `hindi_english` |
241
+ | Telugu | `te` | `telugu` |
242
+ | Tamil | `ta` | With fine-tuning |
243
+ | Kannada | `kn` | With fine-tuning |
244
 
245
+ ## Hub Utilities
 
 
 
 
 
 
 
246
 
247
+ ```python
248
+ from chiluka import list_models, clear_cache, push_to_hub, get_cache_dir
249
+
250
+ # List available models
251
+ list_models()
252
+
253
+ # Clear cache
254
+ clear_cache() # Clear all
255
+ clear_cache("Seemanth/chiluka-tts") # Clear specific repo
256
+
257
+ # Push your own model to HuggingFace
258
+ push_to_hub(
259
+ local_dir="./my-model",
260
+ repo_id="myusername/my-chiluka-model",
261
+ token="hf_your_token"
262
+ )
263
+
264
+ # Check cache location
265
+ print(get_cache_dir()) # ~/.cache/chiluka
266
+ ```
267
+
268
+ ## Environment Variables
269
+
270
+ | Variable | Description |
271
+ |----------|-------------|
272
+ | `CHILUKA_CACHE` | Custom cache directory (default: `~/.cache/chiluka`) |
273
+ | `HF_TOKEN` | HuggingFace API token for private repos |
274
 
275
  ## Requirements
276
 
 
279
  - CUDA (recommended for faster inference)
280
  - espeak-ng
281
 
282
+ ## Package Structure
283
+
284
+ ```
285
+ chiluka/
286
+ ├── chiluka/
287
+ │ ├── __init__.py
288
+ │ ├── inference.py # Main Chiluka API
289
+ │ ├── hub.py # Hub download + model registry
290
+ │ ├── text_utils.py
291
+ │ ├── utils.py
292
+ │ ├── configs/
293
+ │ │ ├── config_ft.yml # Telugu model config
294
+ │ │ └── config_hindi_english.yml # Hindi-English model config
295
+ │ ├── checkpoints/
296
+ │ │ ├── epoch_2nd_00017.pth # Telugu checkpoint
297
+ │ │ └── epoch_2nd_00029.pth # Hindi-English checkpoint
298
+ │ ├── pretrained/ # Shared pretrained sub-models
299
+ │ │ ├── ASR/
300
+ │ │ ├── JDC/
301
+ │ │ └── PLBERT/
302
+ │ └── models/
303
+ ├── hubconf.py # PyTorch Hub config
304
+ ├── examples/
305
+ │ ├── basic_synthesis.py
306
+ │ ├── telugu_synthesis.py
307
+ │ ├── huggingface_example.py
308
+ │ ├── torchhub_example.py
309
+ │ └── pip_example.py
310
+ ├── setup.py
311
+ └── README.md
312
+ ```
313
+
314
  ## Training Your Own Model
315
 
316
  This package is for **inference only**. To train your own model, use the original [StyleTTS2](https://github.com/yl4579/StyleTTS2) repository.
317
 
318
+ After training:
319
+ 1. Copy your checkpoint and config to a directory
320
+ 2. Push to HuggingFace Hub using `push_to_hub()`
321
+ 3. Load with `Chiluka.from_pretrained("your-repo")`
322
 
323
  ## Credits
324
 
chiluka/__init__.py CHANGED
@@ -1,17 +1,17 @@
1
  """
2
  Chiluka - A lightweight TTS inference package based on StyleTTS2
3
 
4
- Usage:
5
- # Local weights (if you have them)
6
- from chiluka import Chiluka
7
- tts = Chiluka()
8
 
9
- # Auto-download from HuggingFace Hub (recommended)
 
10
  from chiluka import Chiluka
11
  tts = Chiluka.from_pretrained()
12
 
13
- # From specific HuggingFace repo
14
- tts = Chiluka.from_pretrained("username/model-name")
15
 
16
  # Generate speech
17
  wav = tts.synthesize(
@@ -31,7 +31,9 @@ from .hub import (
31
  clear_cache,
32
  get_cache_dir,
33
  create_model_card,
 
34
  DEFAULT_HF_REPO,
 
35
  )
36
 
37
  __all__ = [
@@ -41,5 +43,7 @@ __all__ = [
41
  "clear_cache",
42
  "get_cache_dir",
43
  "create_model_card",
 
44
  "DEFAULT_HF_REPO",
 
45
  ]
 
1
  """
2
  Chiluka - A lightweight TTS inference package based on StyleTTS2
3
 
4
+ Available models:
5
+ - 'hindi_english' (default) - Hindi + English multi-speaker TTS
6
+ - 'telugu' - Telugu + English single-speaker TTS
 
7
 
8
+ Usage:
9
+ # Hindi-English model (default, auto-downloads from HuggingFace)
10
  from chiluka import Chiluka
11
  tts = Chiluka.from_pretrained()
12
 
13
+ # Telugu model
14
+ tts = Chiluka.from_pretrained(model="telugu")
15
 
16
  # Generate speech
17
  wav = tts.synthesize(
 
31
  clear_cache,
32
  get_cache_dir,
33
  create_model_card,
34
+ list_models,
35
  DEFAULT_HF_REPO,
36
+ MODEL_REGISTRY,
37
  )
38
 
39
  __all__ = [
 
43
  "clear_cache",
44
  "get_cache_dir",
45
  "create_model_card",
46
+ "list_models",
47
  "DEFAULT_HF_REPO",
48
+ "MODEL_REGISTRY",
49
  ]
chiluka/configs/config_hindi_english.yml ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ log_dir: "Models/hindi_english_multispeaker_finetuned"
2
+ first_stage_path: "first_stage.pth"
3
+ save_freq: 1
4
+ log_interval: 10
5
+ device: "cuda"
6
+
7
+ epochs_1st: 15
8
+ epochs_2nd: 15
9
+
10
+ batch_size: 2
11
+ max_len: 200
12
+
13
+ pretrained_model: ""
14
+ second_stage_load_pretrained: true
15
+ load_only_params: true
16
+
17
+ F0_path: "Utils/JDC/bst.t7"
18
+ ASR_config: "Utils/ASR/config.yml"
19
+ ASR_path: "Utils/ASR/epoch_00080.pth"
20
+ PLBERT_dir: "Utils/PLBERT/"
21
+
22
+ data_params:
23
+ train_data: ""
24
+ val_data: ""
25
+ root_path: ""
26
+ OOD_data: ""
27
+ min_length: 50
28
+
29
+ # Audio preprocessing (24kHz)
30
+ preprocess_params:
31
+ sr: 24000
32
+ spect_params:
33
+ n_fft: 2048
34
+ win_length: 1200
35
+ hop_length: 300
36
+
37
+ # Model architecture
38
+ model_params:
39
+ multispeaker: true
40
+ num_speakers: 5
41
+
42
+ dim_in: 64
43
+ hidden_dim: 512
44
+ max_conv_dim: 512
45
+ n_layer: 3
46
+ n_mels: 80
47
+ n_token: 178
48
+ max_dur: 50
49
+ style_dim: 128
50
+ dropout: 0.2
51
+
52
+ speaker_embed_dim: 256
53
+
54
+ decoder:
55
+ type: "hifigan"
56
+ resblock_dilation_sizes: [[1, 3, 5], [1, 3, 5], [1, 3, 5]]
57
+ resblock_kernel_sizes: [3, 7, 11]
58
+ upsample_initial_channel: 512
59
+ upsample_rates: [10, 5, 3, 2]
60
+ upsample_kernel_sizes: [20, 10, 6, 4]
61
+
62
+ slm:
63
+ model: "microsoft/wavlm-base-plus"
64
+ sr: 16000
65
+ hidden: 768
66
+ nlayers: 13
67
+ initial_channel: 64
68
+
69
+ diffusion:
70
+ embedding_mask_proba: 0.1
71
+ transformer:
72
+ num_layers: 3
73
+ num_heads: 8
74
+ head_features: 64
75
+ multiplier: 2
76
+ dist:
77
+ sigma_data: 0.19926648961191362
78
+ estimate_sigma_data: true
79
+ mean: -3.0
80
+ std: 1.0
81
+
82
+ loss_params:
83
+ lambda_mel: 5.0
84
+ lambda_gen: 1.0
85
+ lambda_slm: 1.0
86
+ lambda_mono: 1.0
87
+ lambda_s2s: 1.0
88
+ lambda_F0: 1.0
89
+ lambda_norm: 1.0
90
+ lambda_dur: 1.0
91
+ lambda_ce: 20.0
92
+ lambda_sty: 1.0
93
+ lambda_diff: 1.0
94
+ TMA_epoch: 2
95
+ diff_epoch: 0
96
+ joint_epoch: 0
97
+
98
+ optimizer_params:
99
+ lr: 0.00005
100
+ bert_lr: 0.000005
101
+ ft_lr: 0.000005
102
+
103
+ slmadv_params:
104
+ min_len: 400
105
+ max_len: 500
106
+ batch_percentage: 0.5
107
+ iter: 20
108
+ thresh: 5
109
+ scale: 0.01
110
+ sig: 1.5
chiluka/hub.py CHANGED
@@ -5,6 +5,7 @@ Supports:
5
  - HuggingFace Hub integration
6
  - Automatic model downloading
7
  - Local caching
 
8
  """
9
 
10
  import os
@@ -13,15 +14,35 @@ from pathlib import Path
13
  from typing import Optional, Union
14
 
15
  # Default HuggingFace Hub repository
16
- DEFAULT_HF_REPO = "yourusername/chiluka-tts" # TODO: Update with your actual repo
17
 
18
  # Cache directory for downloaded models
19
  CACHE_DIR = Path.home() / ".cache" / "chiluka"
20
 
21
- # Required model files
22
- REQUIRED_FILES = {
23
- "checkpoint": "checkpoints/epoch_2nd_00017.pth",
24
- "config": "configs/config_ft.yml",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  "asr_config": "pretrained/ASR/config.yml",
26
  "asr_model": "pretrained/ASR/epoch_00080.pth",
27
  "f0_model": "pretrained/JDC/bst.t7",
@@ -30,6 +51,27 @@ REQUIRED_FILES = {
30
  }
31
 
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  def get_cache_dir() -> Path:
34
  """Get the cache directory for Chiluka models."""
35
  cache_dir = Path(os.environ.get("CHILUKA_CACHE", CACHE_DIR))
@@ -43,11 +85,19 @@ def is_model_cached(repo_id: str = DEFAULT_HF_REPO) -> bool:
43
  if not cache_path.exists():
44
  return False
45
 
46
- # Check if all required files exist
47
- for file_path in REQUIRED_FILES.values():
48
  if not (cache_path / file_path).exists():
49
  return False
50
- return True
 
 
 
 
 
 
 
 
51
 
52
 
53
  def download_from_hf(
@@ -60,21 +110,16 @@ def download_from_hf(
60
  Download model files from HuggingFace Hub.
61
 
62
  Args:
63
- repo_id: HuggingFace Hub repository ID (e.g., 'username/model-name')
64
  revision: Git revision to download (branch, tag, or commit hash)
65
  force_download: If True, re-download even if cached
66
  token: HuggingFace API token for private repos
67
 
68
  Returns:
69
  Path to the downloaded model directory
70
-
71
- Example:
72
- >>> model_path = download_from_hf("yourusername/chiluka-tts")
73
- >>> print(model_path)
74
- /home/user/.cache/chiluka/yourusername_chiluka-tts
75
  """
76
  try:
77
- from huggingface_hub import snapshot_download, hf_hub_download
78
  except ImportError:
79
  raise ImportError(
80
  "huggingface_hub is required for downloading models. "
@@ -89,7 +134,6 @@ def download_from_hf(
89
 
90
  print(f"Downloading model from HuggingFace Hub: {repo_id}...")
91
 
92
- # Download entire repository
93
  downloaded_path = snapshot_download(
94
  repo_id=repo_id,
95
  revision=revision,
@@ -103,60 +147,32 @@ def download_from_hf(
103
  return Path(downloaded_path)
104
 
105
 
106
- def download_from_url(
107
- url: str,
108
- filename: str,
109
- force_download: bool = False,
110
- ) -> Path:
111
- """
112
- Download a single file from a URL.
113
-
114
- Args:
115
- url: URL to download from
116
- filename: Local filename to save as
117
- force_download: If True, re-download even if exists
118
-
119
- Returns:
120
- Path to the downloaded file
121
- """
122
- import urllib.request
123
-
124
- cache_dir = get_cache_dir() / "downloads"
125
- cache_dir.mkdir(parents=True, exist_ok=True)
126
- local_path = cache_dir / filename
127
-
128
- if local_path.exists() and not force_download:
129
- print(f"Using cached file: {local_path}")
130
- return local_path
131
-
132
- print(f"Downloading {filename}...")
133
-
134
- # Download with progress
135
- def _progress_hook(count, block_size, total_size):
136
- percent = int(count * block_size * 100 / total_size)
137
- print(f"\rDownloading: {percent}%", end="", flush=True)
138
-
139
- urllib.request.urlretrieve(url, local_path, reporthook=_progress_hook)
140
- print() # New line after progress
141
-
142
- return local_path
143
-
144
-
145
- def get_model_paths(repo_id: str = DEFAULT_HF_REPO) -> dict:
146
  """
147
  Get paths to all model files after downloading.
148
 
149
  Args:
 
150
  repo_id: HuggingFace Hub repository ID
151
 
152
  Returns:
153
  Dictionary with paths to config, checkpoint, and pretrained directory
154
  """
 
 
 
 
 
 
155
  model_dir = download_from_hf(repo_id)
 
156
 
157
  return {
158
- "config_path": str(model_dir / "configs" / "config_ft.yml"),
159
- "checkpoint_path": str(model_dir / "checkpoints" / "epoch_2nd_00017.pth"),
160
  "pretrained_dir": str(model_dir / "pretrained"),
161
  }
162
 
@@ -202,7 +218,7 @@ def push_to_hub(
202
  Example:
203
  >>> push_to_hub(
204
  ... local_dir="./chiluka",
205
- ... repo_id="myusername/my-chiluka-model",
206
  ... private=False
207
  ... )
208
  """
@@ -245,6 +261,14 @@ def create_model_card(repo_id: str, save_path: Optional[str] = None) -> str:
245
  Returns:
246
  Model card content as string
247
  """
 
 
 
 
 
 
 
 
248
  model_card = f"""---
249
  language:
250
  - en
@@ -257,12 +281,19 @@ tags:
257
  - tts
258
  - styletts2
259
  - voice-cloning
 
260
  ---
261
 
262
  # Chiluka TTS
263
 
264
  Chiluka (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on StyleTTS2.
265
 
 
 
 
 
 
 
266
  ## Installation
267
 
268
  ```bash
@@ -272,64 +303,47 @@ pip install chiluka
272
  Or install from source:
273
 
274
  ```bash
275
- pip install git+https://github.com/{repo_id.split('/')[0]}/chiluka.git
276
  ```
277
 
278
  ## Usage
279
 
280
- ### Quick Start (Auto-download)
281
 
282
  ```python
283
  from chiluka import Chiluka
284
 
285
- # Automatically downloads model weights
286
  tts = Chiluka.from_pretrained()
287
 
288
- # Generate speech
289
  wav = tts.synthesize(
290
  text="Hello, world!",
291
- reference_audio="path/to/reference.wav",
292
  language="en"
293
  )
294
-
295
- # Save output
296
  tts.save_wav(wav, "output.wav")
297
  ```
298
 
299
- ### PyTorch Hub
300
 
301
  ```python
302
- import torch
303
 
304
- tts = torch.hub.load('{repo_id.split('/')[0]}/chiluka', 'chiluka')
305
- wav = tts.synthesize("Hello!", "reference.wav", language="en")
 
 
 
306
  ```
307
 
308
- ### HuggingFace Hub
309
 
310
  ```python
311
- from chiluka import Chiluka
312
 
313
- tts = Chiluka.from_pretrained("{repo_id}")
 
314
  ```
315
 
316
- ## Parameters
317
-
318
- - `text`: Input text to synthesize
319
- - `reference_audio`: Path to reference audio for style transfer
320
- - `language`: Language code ('en', 'te', 'hi', etc.)
321
- - `alpha`: Acoustic style mixing (0-1, default 0.3)
322
- - `beta`: Prosodic style mixing (0-1, default 0.7)
323
- - `diffusion_steps`: Quality vs speed tradeoff (default 5)
324
-
325
- ## Supported Languages
326
-
327
- Uses espeak-ng phonemizer. Common languages:
328
- - English: `en-us`, `en-gb`
329
- - Telugu: `te`
330
- - Hindi: `hi`
331
- - Tamil: `ta`
332
-
333
  ## License
334
 
335
  MIT License
 
5
  - HuggingFace Hub integration
6
  - Automatic model downloading
7
  - Local caching
8
+ - Multiple model variants
9
  """
10
 
11
  import os
 
14
  from typing import Optional, Union
15
 
16
  # Default HuggingFace Hub repository
17
+ DEFAULT_HF_REPO = "Seemanth/chiluka-tts"
18
 
19
  # Cache directory for downloaded models
20
  CACHE_DIR = Path.home() / ".cache" / "chiluka"
21
 
22
+ # ============================================
23
+ # Model Registry
24
+ # ============================================
25
+ # Maps model names to their config + checkpoint paths
26
+ # relative to the repo root.
27
+ MODEL_REGISTRY = {
28
+ "telugu": {
29
+ "config": "configs/config_ft.yml",
30
+ "checkpoint": "checkpoints/epoch_2nd_00017.pth",
31
+ "languages": ["te", "en"],
32
+ "description": "Telugu + English single-speaker TTS",
33
+ },
34
+ "hindi_english": {
35
+ "config": "configs/config_hindi_english.yml",
36
+ "checkpoint": "checkpoints/epoch_2nd_00029.pth",
37
+ "languages": ["hi", "en"],
38
+ "description": "Hindi + English multi-speaker TTS (5 speakers)",
39
+ },
40
+ }
41
+
42
+ DEFAULT_MODEL = "hindi_english"
43
+
44
+ # Shared pretrained sub-models (same across all variants)
45
+ PRETRAINED_FILES = {
46
  "asr_config": "pretrained/ASR/config.yml",
47
  "asr_model": "pretrained/ASR/epoch_00080.pth",
48
  "f0_model": "pretrained/JDC/bst.t7",
 
51
  }
52
 
53
 
54
+ def list_models() -> dict:
55
+ """
56
+ List all available model variants.
57
+
58
+ Returns:
59
+ Dictionary of model names and their info.
60
+
61
+ Example:
62
+ >>> from chiluka import hub
63
+ >>> hub.list_models()
64
+ {'telugu': {...}, 'hindi_english': {...}}
65
+ """
66
+ return {
67
+ name: {
68
+ "languages": info["languages"],
69
+ "description": info["description"],
70
+ }
71
+ for name, info in MODEL_REGISTRY.items()
72
+ }
73
+
74
+
75
  def get_cache_dir() -> Path:
76
  """Get the cache directory for Chiluka models."""
77
  cache_dir = Path(os.environ.get("CHILUKA_CACHE", CACHE_DIR))
 
85
  if not cache_path.exists():
86
  return False
87
 
88
+ # Check if shared pretrained files exist
89
+ for file_path in PRETRAINED_FILES.values():
90
  if not (cache_path / file_path).exists():
91
  return False
92
+
93
+ # Check if at least one model variant exists
94
+ for model_info in MODEL_REGISTRY.values():
95
+ config_exists = (cache_path / model_info["config"]).exists()
96
+ checkpoint_exists = (cache_path / model_info["checkpoint"]).exists()
97
+ if config_exists and checkpoint_exists:
98
+ return True
99
+
100
+ return False
101
 
102
 
103
  def download_from_hf(
 
110
  Download model files from HuggingFace Hub.
111
 
112
  Args:
113
+ repo_id: HuggingFace Hub repository ID (e.g., 'Seemanth/chiluka-tts')
114
  revision: Git revision to download (branch, tag, or commit hash)
115
  force_download: If True, re-download even if cached
116
  token: HuggingFace API token for private repos
117
 
118
  Returns:
119
  Path to the downloaded model directory
 
 
 
 
 
120
  """
121
  try:
122
+ from huggingface_hub import snapshot_download
123
  except ImportError:
124
  raise ImportError(
125
  "huggingface_hub is required for downloading models. "
 
134
 
135
  print(f"Downloading model from HuggingFace Hub: {repo_id}...")
136
 
 
137
  downloaded_path = snapshot_download(
138
  repo_id=repo_id,
139
  revision=revision,
 
147
  return Path(downloaded_path)
148
 
149
 
150
+ def get_model_paths(
151
+ model: str = DEFAULT_MODEL,
152
+ repo_id: str = DEFAULT_HF_REPO,
153
+ ) -> dict:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  """
155
  Get paths to all model files after downloading.
156
 
157
  Args:
158
+ model: Model variant name ('telugu', 'hindi_english')
159
  repo_id: HuggingFace Hub repository ID
160
 
161
  Returns:
162
  Dictionary with paths to config, checkpoint, and pretrained directory
163
  """
164
+ if model not in MODEL_REGISTRY:
165
+ available = ", ".join(MODEL_REGISTRY.keys())
166
+ raise ValueError(
167
+ f"Unknown model '{model}'. Available models: {available}"
168
+ )
169
+
170
  model_dir = download_from_hf(repo_id)
171
+ model_info = MODEL_REGISTRY[model]
172
 
173
  return {
174
+ "config_path": str(model_dir / model_info["config"]),
175
+ "checkpoint_path": str(model_dir / model_info["checkpoint"]),
176
  "pretrained_dir": str(model_dir / "pretrained"),
177
  }
178
 
 
218
  Example:
219
  >>> push_to_hub(
220
  ... local_dir="./chiluka",
221
+ ... repo_id="Seemanth/chiluka-tts",
222
  ... private=False
223
  ... )
224
  """
 
261
  Returns:
262
  Model card content as string
263
  """
264
+ owner = repo_id.split("/")[0]
265
+
266
+ # Build model table
267
+ model_rows = ""
268
+ for name, info in MODEL_REGISTRY.items():
269
+ langs = ", ".join(info["languages"])
270
+ model_rows += f"| `{name}` | {info['description']} | {langs} |\n"
271
+
272
  model_card = f"""---
273
  language:
274
  - en
 
281
  - tts
282
  - styletts2
283
  - voice-cloning
284
+ - multi-language
285
  ---
286
 
287
  # Chiluka TTS
288
 
289
  Chiluka (చిలుక - Telugu for "parrot") is a lightweight Text-to-Speech model based on StyleTTS2.
290
 
291
+ ## Available Models
292
+
293
+ | Model | Description | Languages |
294
+ |-------|-------------|-----------|
295
+ {model_rows}
296
+
297
  ## Installation
298
 
299
  ```bash
 
303
  Or install from source:
304
 
305
  ```bash
306
+ pip install git+https://github.com/{owner}/chiluka.git
307
  ```
308
 
309
  ## Usage
310
 
311
+ ### Hindi + English (default)
312
 
313
  ```python
314
  from chiluka import Chiluka
315
 
 
316
  tts = Chiluka.from_pretrained()
317
 
 
318
  wav = tts.synthesize(
319
  text="Hello, world!",
320
+ reference_audio="reference.wav",
321
  language="en"
322
  )
 
 
323
  tts.save_wav(wav, "output.wav")
324
  ```
325
 
326
+ ### Telugu
327
 
328
  ```python
329
+ tts = Chiluka.from_pretrained(model="telugu")
330
 
331
+ wav = tts.synthesize(
332
+ text="నమస్కారం",
333
+ reference_audio="reference.wav",
334
+ language="te"
335
+ )
336
  ```
337
 
338
+ ### PyTorch Hub
339
 
340
  ```python
341
+ import torch
342
 
343
+ tts = torch.hub.load('{owner}/chiluka', 'chiluka')
344
+ tts = torch.hub.load('{owner}/chiluka', 'chiluka_telugu')
345
  ```
346
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
347
  ## License
348
 
349
  MIT License
chiluka/inference.py CHANGED
@@ -155,6 +155,7 @@ class Chiluka:
155
  @classmethod
156
  def from_pretrained(
157
  cls,
 
158
  repo_id: str = None,
159
  device: Optional[str] = None,
160
  force_download: bool = False,
@@ -168,7 +169,10 @@ class Chiluka:
168
  Weights are automatically downloaded and cached on first use.
169
 
170
  Args:
171
- repo_id: HuggingFace Hub repository ID (e.g., 'username/chiluka-tts').
 
 
 
172
  If None, uses the default repository.
173
  device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
174
  force_download: If True, re-download even if cached.
@@ -179,31 +183,32 @@ class Chiluka:
179
  Initialized Chiluka TTS model ready for inference.
180
 
181
  Examples:
182
- # Default repository (auto-download)
183
  >>> tts = Chiluka.from_pretrained()
184
 
185
- # Specific repository
186
- >>> tts = Chiluka.from_pretrained("myuser/my-chiluka-model")
 
 
 
187
 
188
  # Force re-download
189
  >>> tts = Chiluka.from_pretrained(force_download=True)
190
-
191
- # Private repository
192
- >>> tts = Chiluka.from_pretrained("myuser/private-model", token="hf_xxx")
193
  """
194
- from .hub import download_from_hf, get_model_paths, DEFAULT_HF_REPO
195
 
 
196
  repo_id = repo_id or DEFAULT_HF_REPO
197
 
198
  # Download model files (or use cache)
199
- model_dir = download_from_hf(
200
  repo_id=repo_id,
201
  force_download=force_download,
202
  token=token,
203
  )
204
 
205
- # Get paths to model files
206
- paths = get_model_paths(repo_id)
207
 
208
  return cls(
209
  config_path=paths["config_path"],
 
155
  @classmethod
156
  def from_pretrained(
157
  cls,
158
+ model: str = None,
159
  repo_id: str = None,
160
  device: Optional[str] = None,
161
  force_download: bool = False,
 
169
  Weights are automatically downloaded and cached on first use.
170
 
171
  Args:
172
+ model: Model variant to load. Options:
173
+ - 'hindi_english' (default) - Hindi + English multi-speaker TTS
174
+ - 'telugu' - Telugu + English single-speaker TTS
175
+ repo_id: HuggingFace Hub repository ID (e.g., 'Seemanth/chiluka-tts').
176
  If None, uses the default repository.
177
  device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
178
  force_download: If True, re-download even if cached.
 
183
  Initialized Chiluka TTS model ready for inference.
184
 
185
  Examples:
186
+ # Hindi-English model (default)
187
  >>> tts = Chiluka.from_pretrained()
188
 
189
+ # Telugu model
190
+ >>> tts = Chiluka.from_pretrained(model="telugu")
191
+
192
+ # Specific HuggingFace repository
193
+ >>> tts = Chiluka.from_pretrained(repo_id="myuser/my-model")
194
 
195
  # Force re-download
196
  >>> tts = Chiluka.from_pretrained(force_download=True)
 
 
 
197
  """
198
+ from .hub import download_from_hf, get_model_paths, DEFAULT_HF_REPO, DEFAULT_MODEL
199
 
200
+ model = model or DEFAULT_MODEL
201
  repo_id = repo_id or DEFAULT_HF_REPO
202
 
203
  # Download model files (or use cache)
204
+ download_from_hf(
205
  repo_id=repo_id,
206
  force_download=force_download,
207
  token=token,
208
  )
209
 
210
+ # Get paths to model files for the selected variant
211
+ paths = get_model_paths(model=model, repo_id=repo_id)
212
 
213
  return cls(
214
  config_path=paths["config_path"],
examples/huggingface_example.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Chiluka TTS - HuggingFace Hub Example
3
+
4
+ Load model weights directly from HuggingFace Hub.
5
+ No need to clone the repository or download weights manually.
6
+
7
+ Requirements:
8
+ pip install chiluka
9
+ sudo apt-get install espeak-ng
10
+
11
+ Usage:
12
+ python huggingface_example.py --reference path/to/reference.wav
13
+ python huggingface_example.py --reference ref.wav --model telugu --language te --text "నమస్కారం"
14
+ """
15
+
16
+ import argparse
17
+ from chiluka import Chiluka, list_models
18
+
19
+
20
+ def main():
21
+ parser = argparse.ArgumentParser(description="Chiluka TTS - HuggingFace Hub Example")
22
+ parser.add_argument("--reference", type=str, required=True, help="Path to reference audio file")
23
+ parser.add_argument("--model", type=str, default="hindi_english", choices=["hindi_english", "telugu"],
24
+ help="Model variant to use (default: hindi_english)")
25
+ parser.add_argument("--text", type=str, default=None, help="Text to synthesize")
26
+ parser.add_argument("--language", type=str, default=None, help="Language code (en, hi, te)")
27
+ parser.add_argument("--output", type=str, default="output_hf.wav", help="Output wav file path")
28
+ parser.add_argument("--device", type=str, default=None, help="Device: cuda or cpu")
29
+ args = parser.parse_args()
30
+
31
+ # Show available models
32
+ print("Available models:")
33
+ for name, info in list_models().items():
34
+ marker = " <--" if name == args.model else ""
35
+ print(f" {name}: {info['description']}{marker}")
36
+ print()
37
+
38
+ # Set defaults based on model choice
39
+ if args.text is None:
40
+ if args.model == "telugu":
41
+ args.text = "నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను"
42
+ else:
43
+ args.text = "Hello, I am Chiluka, a text to speech system."
44
+
45
+ if args.language is None:
46
+ if args.model == "telugu":
47
+ args.language = "te"
48
+ else:
49
+ args.language = "en"
50
+
51
+ # Load model from HuggingFace Hub (auto-downloads on first use)
52
+ print(f"Loading '{args.model}' model from HuggingFace Hub...")
53
+ tts = Chiluka.from_pretrained(model=args.model, device=args.device)
54
+
55
+ # Synthesize
56
+ print(f"Synthesizing: '{args.text}'")
57
+ print(f"Language: {args.language}")
58
+ wav = tts.synthesize(
59
+ text=args.text,
60
+ reference_audio=args.reference,
61
+ language=args.language,
62
+ )
63
+
64
+ # Save
65
+ tts.save_wav(wav, args.output)
66
+ print(f"Duration: {len(wav) / 24000:.2f} seconds")
67
+
68
+
69
+ if __name__ == "__main__":
70
+ main()
examples/pip_example.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Chiluka TTS - pip install Example
3
+
4
+ After installing via pip, model weights auto-download from HuggingFace
5
+ on first use and are cached locally.
6
+
7
+ Install:
8
+ pip install chiluka
9
+ sudo apt-get install espeak-ng
10
+
11
+ Usage:
12
+ python pip_example.py --reference path/to/reference.wav
13
+ python pip_example.py --reference ref.wav --model telugu --language te
14
+ """
15
+
16
+ import argparse
17
+
18
+
19
+ def main():
20
+ parser = argparse.ArgumentParser(description="Chiluka TTS - pip Example")
21
+ parser.add_argument("--reference", type=str, required=True, help="Path to reference audio file")
22
+ parser.add_argument("--model", type=str, default="hindi_english", choices=["hindi_english", "telugu"],
23
+ help="Model variant (default: hindi_english)")
24
+ parser.add_argument("--text", type=str, default=None, help="Text to synthesize")
25
+ parser.add_argument("--language", type=str, default=None, help="Language code (en, hi, te)")
26
+ parser.add_argument("--output", type=str, default="output_pip.wav", help="Output wav file path")
27
+ args = parser.parse_args()
28
+
29
+ # Import after argparse so --help is fast
30
+ from chiluka import Chiluka, list_models
31
+
32
+ # Set defaults
33
+ if args.text is None:
34
+ texts = {
35
+ "hindi_english": "Hello, I am Chiluka, a text to speech system.",
36
+ "telugu": "నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను",
37
+ }
38
+ args.text = texts[args.model]
39
+
40
+ if args.language is None:
41
+ langs = {"hindi_english": "en", "telugu": "te"}
42
+ args.language = langs[args.model]
43
+
44
+ # List models
45
+ print("Available models:")
46
+ for name, info in list_models().items():
47
+ print(f" {name}: {info['description']}")
48
+ print()
49
+
50
+ # Load model (auto-downloads weights on first run)
51
+ print(f"Loading '{args.model}' model...")
52
+ tts = Chiluka.from_pretrained(model=args.model)
53
+
54
+ # Synthesize speech
55
+ print(f"Text: '{args.text}'")
56
+ print(f"Language: {args.language}")
57
+ print(f"Reference: {args.reference}")
58
+ print()
59
+
60
+ wav = tts.synthesize(
61
+ text=args.text,
62
+ reference_audio=args.reference,
63
+ language=args.language,
64
+ alpha=0.3,
65
+ beta=0.7,
66
+ diffusion_steps=5,
67
+ embedding_scale=1.0,
68
+ )
69
+
70
+ # Save output
71
+ tts.save_wav(wav, args.output)
72
+ print(f"Duration: {len(wav) / 24000:.2f} seconds")
73
+
74
+ # --- Bonus: synthesize in another language with same model ---
75
+ if args.model == "hindi_english":
76
+ print("\n--- Bonus: Hindi synthesis with same model ---")
77
+ hindi_wav = tts.synthesize(
78
+ text="नमस्ते, मैं चिलुका बोल रहा हूं",
79
+ reference_audio=args.reference,
80
+ language="hi",
81
+ )
82
+ hindi_output = args.output.replace(".wav", "_hindi.wav")
83
+ tts.save_wav(hindi_wav, hindi_output)
84
+ print(f"Duration: {len(hindi_wav) / 24000:.2f} seconds")
85
+
86
+
87
+ if __name__ == "__main__":
88
+ main()
examples/torchhub_example.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Chiluka TTS - PyTorch Hub Example
3
+
4
+ Load the model using torch.hub.load() - no pip install needed,
5
+ just PyTorch and a GitHub repo.
6
+
7
+ Requirements:
8
+ pip install torch torchaudio
9
+ sudo apt-get install espeak-ng
10
+
11
+ Usage:
12
+ python torchhub_example.py --reference path/to/reference.wav
13
+ python torchhub_example.py --reference ref.wav --variant telugu --language te
14
+ """
15
+
16
+ import argparse
17
+ import torch
18
+
19
+
20
+ def main():
21
+ parser = argparse.ArgumentParser(description="Chiluka TTS - PyTorch Hub Example")
22
+ parser.add_argument("--reference", type=str, required=True, help="Path to reference audio file")
23
+ parser.add_argument("--variant", type=str, default="default", choices=["default", "telugu", "hindi_english"],
24
+ help="Model variant (default, telugu, hindi_english)")
25
+ parser.add_argument("--text", type=str, default=None, help="Text to synthesize")
26
+ parser.add_argument("--language", type=str, default=None, help="Language code (en, hi, te)")
27
+ parser.add_argument("--output", type=str, default="output_torchhub.wav", help="Output wav file path")
28
+ args = parser.parse_args()
29
+
30
+ # Set defaults
31
+ if args.text is None:
32
+ if args.variant == "telugu":
33
+ args.text = "నమస్కారం, నేను చిలుక మాట్లాడుతున్నాను"
34
+ else:
35
+ args.text = "Hello, I am Chiluka, a text to speech system."
36
+
37
+ if args.language is None:
38
+ if args.variant == "telugu":
39
+ args.language = "te"
40
+ else:
41
+ args.language = "en"
42
+
43
+ # Load via torch.hub
44
+ # Available entry points:
45
+ # 'chiluka' - Hindi-English model (default)
46
+ # 'chiluka_telugu' - Telugu model
47
+ # 'chiluka_hindi_english' - Hindi-English model (explicit)
48
+ print(f"Loading model via torch.hub (variant: {args.variant})...")
49
+
50
+ if args.variant == "telugu":
51
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
52
+ else:
53
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
54
+
55
+ # Synthesize
56
+ print(f"Synthesizing: '{args.text}'")
57
+ print(f"Language: {args.language}")
58
+ wav = tts.synthesize(
59
+ text=args.text,
60
+ reference_audio=args.reference,
61
+ language=args.language,
62
+ )
63
+
64
+ # Save
65
+ tts.save_wav(wav, args.output)
66
+ print(f"Duration: {len(wav) / 24000:.2f} seconds")
67
+
68
+
69
+ if __name__ == "__main__":
70
+ main()
hubconf.py CHANGED
@@ -4,11 +4,11 @@ PyTorch Hub configuration for Chiluka TTS.
4
  Usage:
5
  import torch
6
 
7
- # Load the model
8
- tts = torch.hub.load('yourusername/chiluka', 'chiluka')
9
 
10
- # Or with force reload
11
- tts = torch.hub.load('yourusername/chiluka', 'chiluka', force_reload=True)
12
 
13
  # Generate speech
14
  wav = tts.synthesize(
@@ -37,11 +37,10 @@ dependencies = [
37
 
38
  def chiluka(pretrained: bool = True, device: str = None, **kwargs):
39
  """
40
- Load Chiluka TTS model.
41
 
42
  Args:
43
  pretrained: If True, downloads pretrained weights from HuggingFace Hub.
44
- If False, returns uninitialized model (requires manual weight loading).
45
  device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
46
  **kwargs: Additional arguments passed to Chiluka constructor.
47
 
@@ -50,25 +49,23 @@ def chiluka(pretrained: bool = True, device: str = None, **kwargs):
50
 
51
  Example:
52
  >>> import torch
53
- >>> tts = torch.hub.load('yourusername/chiluka', 'chiluka')
54
  >>> wav = tts.synthesize("Hello!", "reference.wav", language="en")
55
  """
56
  from chiluka import Chiluka
57
 
58
  if pretrained:
59
- # Use from_pretrained to auto-download weights
60
- return Chiluka.from_pretrained(device=device, **kwargs)
61
  else:
62
- # Return model expecting local weights
63
  return Chiluka(device=device, **kwargs)
64
 
65
 
66
- def chiluka_from_hf(repo_id: str = "yourusername/chiluka-tts", device: str = None, **kwargs):
67
  """
68
- Load Chiluka TTS from a specific HuggingFace Hub repository.
69
 
70
  Args:
71
- repo_id: HuggingFace Hub repository ID (e.g., 'username/model-name')
72
  device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
73
  **kwargs: Additional arguments passed to Chiluka constructor.
74
 
@@ -77,8 +74,43 @@ def chiluka_from_hf(repo_id: str = "yourusername/chiluka-tts", device: str = Non
77
 
78
  Example:
79
  >>> import torch
80
- >>> tts = torch.hub.load('yourusername/chiluka', 'chiluka_from_hf',
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  ... repo_id='myuser/my-custom-chiluka')
82
  """
83
  from chiluka import Chiluka
84
- return Chiluka.from_pretrained(repo_id=repo_id, device=device, **kwargs)
 
4
  Usage:
5
  import torch
6
 
7
+ # Load Hindi-English model (default)
8
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
9
 
10
+ # Load Telugu model
11
+ tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
12
 
13
  # Generate speech
14
  wav = tts.synthesize(
 
37
 
38
  def chiluka(pretrained: bool = True, device: str = None, **kwargs):
39
  """
40
+ Load Chiluka Hindi-English TTS model (default).
41
 
42
  Args:
43
  pretrained: If True, downloads pretrained weights from HuggingFace Hub.
 
44
  device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
45
  **kwargs: Additional arguments passed to Chiluka constructor.
46
 
 
49
 
50
  Example:
51
  >>> import torch
52
+ >>> tts = torch.hub.load('Seemanth/chiluka', 'chiluka')
53
  >>> wav = tts.synthesize("Hello!", "reference.wav", language="en")
54
  """
55
  from chiluka import Chiluka
56
 
57
  if pretrained:
58
+ return Chiluka.from_pretrained(model="hindi_english", device=device, **kwargs)
 
59
  else:
 
60
  return Chiluka(device=device, **kwargs)
61
 
62
 
63
+ def chiluka_telugu(pretrained: bool = True, device: str = None, **kwargs):
64
  """
65
+ Load Chiluka Telugu TTS model.
66
 
67
  Args:
68
+ pretrained: If True, downloads pretrained weights from HuggingFace Hub.
69
  device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
70
  **kwargs: Additional arguments passed to Chiluka constructor.
71
 
 
74
 
75
  Example:
76
  >>> import torch
77
+ >>> tts = torch.hub.load('Seemanth/chiluka', 'chiluka_telugu')
78
+ >>> wav = tts.synthesize("నమస్కారం", "reference.wav", language="te")
79
+ """
80
+ from chiluka import Chiluka
81
+
82
+ if pretrained:
83
+ return Chiluka.from_pretrained(model="telugu", device=device, **kwargs)
84
+ else:
85
+ return Chiluka(device=device, **kwargs)
86
+
87
+
88
+ def chiluka_hindi_english(pretrained: bool = True, device: str = None, **kwargs):
89
+ """
90
+ Load Chiluka Hindi-English TTS model (explicit name).
91
+
92
+ Same as `chiluka()` but with an explicit name.
93
+
94
+ Example:
95
+ >>> import torch
96
+ >>> tts = torch.hub.load('Seemanth/chiluka', 'chiluka_hindi_english')
97
+ """
98
+ return chiluka(pretrained=pretrained, device=device, **kwargs)
99
+
100
+
101
+ def chiluka_from_hf(repo_id: str = "Seemanth/chiluka-tts", model: str = "hindi_english", device: str = None, **kwargs):
102
+ """
103
+ Load Chiluka TTS from a specific HuggingFace Hub repository.
104
+
105
+ Args:
106
+ repo_id: HuggingFace Hub repository ID
107
+ model: Model variant ('hindi_english' or 'telugu')
108
+ device: Device to use ('cuda' or 'cpu'). Auto-detects if None.
109
+
110
+ Example:
111
+ >>> import torch
112
+ >>> tts = torch.hub.load('Seemanth/chiluka', 'chiluka_from_hf',
113
  ... repo_id='myuser/my-custom-chiluka')
114
  """
115
  from chiluka import Chiluka
116
+ return Chiluka.from_pretrained(repo_id=repo_id, model=model, device=device, **kwargs)
pyproject.toml CHANGED
@@ -47,10 +47,10 @@ playback = ["pyaudio>=0.2.11"]
47
  dev = ["pytest>=7.0.0", "black>=22.0.0", "isort>=5.10.0"]
48
 
49
  [project.urls]
50
- Homepage = "https://github.com/yourusername/chiluka"
51
- Documentation = "https://github.com/yourusername/chiluka#readme"
52
- Repository = "https://github.com/yourusername/chiluka"
53
- Issues = "https://github.com/yourusername/chiluka/issues"
54
 
55
  [tool.setuptools.packages.find]
56
  where = ["."]
 
47
  dev = ["pytest>=7.0.0", "black>=22.0.0", "isort>=5.10.0"]
48
 
49
  [project.urls]
50
+ Homepage = "https://github.com/Seemanth/chiluka"
51
+ Documentation = "https://github.com/Seemanth/chiluka#readme"
52
+ Repository = "https://github.com/Seemanth/chiluka"
53
+ Issues = "https://github.com/Seemanth/chiluka/issues"
54
 
55
  [tool.setuptools.packages.find]
56
  where = ["."]
setup.py CHANGED
@@ -8,13 +8,34 @@ with open("README.md", "r", encoding="utf-8") as fh:
8
  setup(
9
  name="chiluka",
10
  version="0.1.0",
11
- author="Your Name",
12
- author_email="your.email@example.com",
13
  description="Chiluka - A lightweight TTS inference package based on StyleTTS2",
14
  long_description=long_description,
15
  long_description_content_type="text/markdown",
16
- url="https://github.com/yourusername/chiluka",
17
  packages=find_packages(),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  classifiers=[
19
  "Development Status :: 3 - Alpha",
20
  "Intended Audience :: Developers",
 
8
  setup(
9
  name="chiluka",
10
  version="0.1.0",
11
+ author="Seemanth",
12
+ author_email="seemanth.k@purviewservices.com",
13
  description="Chiluka - A lightweight TTS inference package based on StyleTTS2",
14
  long_description=long_description,
15
  long_description_content_type="text/markdown",
16
+ url="https://github.com/PurviewVoiceBot/chiluka",
17
  packages=find_packages(),
18
+ include_package_data=False, # Don't include large model files
19
+ package_data={
20
+ "chiluka": [
21
+ "configs/*.yml",
22
+ "pretrained/ASR/config.yml",
23
+ "pretrained/ASR/*.py",
24
+ "pretrained/JDC/*.py",
25
+ "pretrained/PLBERT/config.yml",
26
+ "pretrained/PLBERT/*.py",
27
+ "models/*.py",
28
+ "models/diffusion/*.py",
29
+ ],
30
+ },
31
+ exclude_package_data={
32
+ "chiluka": [
33
+ "checkpoints/*.pth",
34
+ "pretrained/ASR/*.pth",
35
+ "pretrained/JDC/*.t7",
36
+ "pretrained/PLBERT/*.t7",
37
+ ],
38
+ },
39
  classifiers=[
40
  "Development Status :: 3 - Alpha",
41
  "Intended Audience :: Developers",