NMikka commited on
Commit
8c3dac7
·
verified ·
1 Parent(s): e6671a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -9
README.md CHANGED
@@ -16,7 +16,7 @@ datasets:
16
 
17
  # CSM-1B Georgian
18
 
19
- A fine-tuned version of [sesame/csm-1b](https://huggingface.co/sesame/csm-1b) for **Georgian text-to-speech**. This is the first open-source Georgian TTS model based on the CSM architecture.
20
 
21
  ## Model Details
22
 
@@ -76,15 +76,12 @@ sf.write("output.wav", audio[0].cpu().numpy(), 24000)
76
 
77
  ### Speaker Selection
78
 
79
- The model supports 12 speakers (IDs 0–11). Speaker 7 produces the most intelligible output (0.42% CER in-domain). Per-speaker quality varies:
80
 
81
  | Speaker | CER | Recommended |
82
  |---------|-----|-------------|
83
  | 7 | 0.0042 | Best overall |
84
  | 3 | 0.0174 | Best speaker similarity |
85
- | 8 | 0.0142 | Good |
86
- | 11 | 0.0129 | Good |
87
- | 14 | 0.0117 | Good |
88
 
89
  Use the speaker ID in the text prefix: `[7]your text here`.
90
 
@@ -135,12 +132,8 @@ for i, audio in enumerate(audios):
135
  - Trained on 12 speakers from Common Voice Georgian — limited speaker diversity
136
  - Long sentences (>10s of audio) may produce hallucinations or truncations
137
  - 4.1% of FLEURS samples had CER > 50% (failure cases on complex text)
138
- - No emotion or prosody control
139
  - Georgian only
140
 
141
- ## Part of the Georgian TTS Benchmark
142
-
143
- This model was trained as part of the first Georgian TTS benchmark — a comparative study of 6 open-source TTS architectures. See the full project: [github.com/NMikaa/TTS_pipelines](https://github.com/NMikaa/TTS_pipelines)
144
 
145
  ## Citation
146
 
 
16
 
17
  # CSM-1B Georgian
18
 
19
+ A fine-tuned version of [sesame/csm-1b](https://huggingface.co/sesame/csm-1b) for **Georgian text-to-speech**. This is open-source Georgian TTS model based on the CSM architecture.
20
 
21
  ## Model Details
22
 
 
76
 
77
  ### Speaker Selection
78
 
79
+ The model supports 12 speakers (IDs ['1', '10', '11', '12', '14', '2', '3', '4', '5', '6', '7', '8'], sorry for this! going to fix in next models). Speaker 7 produces the most intelligible output (0.42% CER in-domain). Per-speaker quality varies:
80
 
81
  | Speaker | CER | Recommended |
82
  |---------|-----|-------------|
83
  | 7 | 0.0042 | Best overall |
84
  | 3 | 0.0174 | Best speaker similarity |
 
 
 
85
 
86
  Use the speaker ID in the text prefix: `[7]your text here`.
87
 
 
132
  - Trained on 12 speakers from Common Voice Georgian — limited speaker diversity
133
  - Long sentences (>10s of audio) may produce hallucinations or truncations
134
  - 4.1% of FLEURS samples had CER > 50% (failure cases on complex text)
 
135
  - Georgian only
136
 
 
 
 
137
 
138
  ## Citation
139