arijitx commited on
Commit
588c5e3
·
verified ·
1 Parent(s): 190661c

Update usage: correct loading instructions for extended vocab

Browse files
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -26,12 +26,24 @@ A fine-tuned version of [ResembleAI/chatterbox](https://huggingface.co/ResembleA
26
  - **Training steps**: 20,000
27
  - **Epochs**: ~10.6
28
  - **Architecture**: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder
 
 
 
 
 
 
 
 
 
29
 
30
  ## Usage
31
 
32
  ```python
 
 
 
33
  from huggingface_hub import snapshot_download
34
- from chatterbox.tts import ChatterboxTTS
35
  import torchaudio
36
 
37
  model_dir = snapshot_download("arijitx/chatterbox-bangla")
@@ -53,7 +65,7 @@ wav = model.generate(text, audio_prompt_path="reference.wav")
53
 
54
  | File | Description |
55
  |------|-------------|
56
- | `t3_cfg.safetensors` | Fine-tuned T3 text-to-token transformer (Bengali) |
57
  | `s3gen.safetensors` | Speech codec decoder (unchanged from base) |
58
  | `ve.safetensors` | Voice encoder (unchanged from base) |
59
  | `conds.pt` | Conditioning embeddings (unchanged from base) |
@@ -69,3 +81,4 @@ Datasets sourced from AI4Bharat and SPRINGLab public datasets.
69
  - Optimized for Bengali; other languages may degrade
70
  - Best results with clear, well-punctuated Bengali text
71
  - Emotion control inherited from base ChatterBox multilingual model
 
 
26
  - **Training steps**: 20,000
27
  - **Epochs**: ~10.6
28
  - **Architecture**: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder
29
+ - **Vocabulary**: Extended from 704 → 2,530 tokens to cover Bengali characters
30
+
31
+ ## Requirements
32
+
33
+ ```bash
34
+ git clone https://github.com/gokhaneraslan/chatterbox-finetuning
35
+ cd chatterbox-finetuning
36
+ pip install -r requirements.txt
37
+ ```
38
 
39
  ## Usage
40
 
41
  ```python
42
+ import sys
43
+ sys.path.insert(0, "/path/to/chatterbox-finetuning")
44
+
45
  from huggingface_hub import snapshot_download
46
+ from src.chatterbox_.tts import ChatterboxTTS
47
  import torchaudio
48
 
49
  model_dir = snapshot_download("arijitx/chatterbox-bangla")
 
65
 
66
  | File | Description |
67
  |------|-------------|
68
+ | `t3_cfg.safetensors` | Fine-tuned T3 text-to-token transformer (Bengali, vocab=2530) |
69
  | `s3gen.safetensors` | Speech codec decoder (unchanged from base) |
70
  | `ve.safetensors` | Voice encoder (unchanged from base) |
71
  | `conds.pt` | Conditioning embeddings (unchanged from base) |
 
81
  - Optimized for Bengali; other languages may degrade
82
  - Best results with clear, well-punctuated Bengali text
83
  - Emotion control inherited from base ChatterBox multilingual model
84
+ - Requires chatterbox-finetuning kit due to extended Bengali vocabulary