Update usage: correct loading instructions for extended vocab
Browse files
README.md
CHANGED
|
@@ -26,12 +26,24 @@ A fine-tuned version of [ResembleAI/chatterbox](https://huggingface.co/ResembleA
|
|
| 26 |
- **Training steps**: 20,000
|
| 27 |
- **Epochs**: ~10.6
|
| 28 |
- **Architecture**: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Usage
|
| 31 |
|
| 32 |
```python
|
|
|
|
|
|
|
|
|
|
| 33 |
from huggingface_hub import snapshot_download
|
| 34 |
-
from
|
| 35 |
import torchaudio
|
| 36 |
|
| 37 |
model_dir = snapshot_download("arijitx/chatterbox-bangla")
|
|
@@ -53,7 +65,7 @@ wav = model.generate(text, audio_prompt_path="reference.wav")
|
|
| 53 |
|
| 54 |
| File | Description |
|
| 55 |
|------|-------------|
|
| 56 |
-
| `t3_cfg.safetensors` | Fine-tuned T3 text-to-token transformer (Bengali) |
|
| 57 |
| `s3gen.safetensors` | Speech codec decoder (unchanged from base) |
|
| 58 |
| `ve.safetensors` | Voice encoder (unchanged from base) |
|
| 59 |
| `conds.pt` | Conditioning embeddings (unchanged from base) |
|
|
@@ -69,3 +81,4 @@ Datasets sourced from AI4Bharat and SPRINGLab public datasets.
|
|
| 69 |
- Optimized for Bengali; other languages may degrade
|
| 70 |
- Best results with clear, well-punctuated Bengali text
|
| 71 |
- Emotion control inherited from base ChatterBox multilingual model
|
|
|
|
|
|
| 26 |
- **Training steps**: 20,000
|
| 27 |
- **Epochs**: ~10.6
|
| 28 |
- **Architecture**: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder
|
| 29 |
+
- **Vocabulary**: Extended from 704 → 2,530 tokens to cover Bengali characters
|
| 30 |
+
|
| 31 |
+
## Requirements
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
git clone https://github.com/gokhaneraslan/chatterbox-finetuning
|
| 35 |
+
cd chatterbox-finetuning
|
| 36 |
+
pip install -r requirements.txt
|
| 37 |
+
```
|
| 38 |
|
| 39 |
## Usage
|
| 40 |
|
| 41 |
```python
|
| 42 |
+
import sys
|
| 43 |
+
sys.path.insert(0, "/path/to/chatterbox-finetuning")
|
| 44 |
+
|
| 45 |
from huggingface_hub import snapshot_download
|
| 46 |
+
from src.chatterbox_.tts import ChatterboxTTS
|
| 47 |
import torchaudio
|
| 48 |
|
| 49 |
model_dir = snapshot_download("arijitx/chatterbox-bangla")
|
|
|
|
| 65 |
|
| 66 |
| File | Description |
|
| 67 |
|------|-------------|
|
| 68 |
+
| `t3_cfg.safetensors` | Fine-tuned T3 text-to-token transformer (Bengali, vocab=2530) |
|
| 69 |
| `s3gen.safetensors` | Speech codec decoder (unchanged from base) |
|
| 70 |
| `ve.safetensors` | Voice encoder (unchanged from base) |
|
| 71 |
| `conds.pt` | Conditioning embeddings (unchanged from base) |
|
|
|
|
| 81 |
- Optimized for Bengali; other languages may degrade
|
| 82 |
- Best results with clear, well-punctuated Bengali text
|
| 83 |
- Emotion control inherited from base ChatterBox multilingual model
|
| 84 |
+
- Requires chatterbox-finetuning kit due to extended Bengali vocabulary
|