BosonLab
/

chatterbox-bangla

@@ -26,12 +26,24 @@ A fine-tuned version of [ResembleAI/chatterbox](https://huggingface.co/ResembleA
 - **Training steps**: 20,000
 - **Epochs**: ~10.6
 - **Architecture**: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder
 ## Usage
 ```python
 from huggingface_hub import snapshot_download
-from chatterbox.tts import ChatterboxTTS
 import torchaudio
 model_dir = snapshot_download("arijitx/chatterbox-bangla")
@@ -53,7 +65,7 @@ wav = model.generate(text, audio_prompt_path="reference.wav")
 | File | Description |
 |------|-------------|
-| `t3_cfg.safetensors` | Fine-tuned T3 text-to-token transformer (Bengali) |
 | `s3gen.safetensors` | Speech codec decoder (unchanged from base) |
 | `ve.safetensors` | Voice encoder (unchanged from base) |
 | `conds.pt` | Conditioning embeddings (unchanged from base) |
@@ -69,3 +81,4 @@ Datasets sourced from AI4Bharat and SPRINGLab public datasets.
 - Optimized for Bengali; other languages may degrade
 - Best results with clear, well-punctuated Bengali text
 - Emotion control inherited from base ChatterBox multilingual model

 - **Training steps**: 20,000
 - **Epochs**: ~10.6
 - **Architecture**: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder
+- **Vocabulary**: Extended from 704 → 2,530 tokens to cover Bengali characters
+## Requirements
+```bash
+git clone https://github.com/gokhaneraslan/chatterbox-finetuning
+cd chatterbox-finetuning
+pip install -r requirements.txt
+```
 ## Usage
 ```python
+import sys
+sys.path.insert(0, "/path/to/chatterbox-finetuning")
 from huggingface_hub import snapshot_download
+from src.chatterbox_.tts import ChatterboxTTS
 import torchaudio
 model_dir = snapshot_download("arijitx/chatterbox-bangla")
 | File | Description |
 |------|-------------|
+| `t3_cfg.safetensors` | Fine-tuned T3 text-to-token transformer (Bengali, vocab=2530) |
 | `s3gen.safetensors` | Speech codec decoder (unchanged from base) |
 | `ve.safetensors` | Voice encoder (unchanged from base) |
 | `conds.pt` | Conditioning embeddings (unchanged from base) |
 - Optimized for Bengali; other languages may degrade
 - Best results with clear, well-punctuated Bengali text
 - Emotion control inherited from base ChatterBox multilingual model
+- Requires chatterbox-finetuning kit due to extended Bengali vocabulary