NMikka
/

CSM-1B-Georgian

speech-synthesis

Model card Files Files and versions

NMikka commited on Mar 9

Commit

8c3dac7

·

verified ·

1 Parent(s): e6671a6

Update README.md

Files changed (1) hide show

README.md +2 -9

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ datasets:
 # CSM-1B Georgian
-A fine-tuned version of [sesame/csm-1b](https://huggingface.co/sesame/csm-1b) for **Georgian text-to-speech**. This is the first open-source Georgian TTS model based on the CSM architecture.
 ## Model Details
@@ -76,15 +76,12 @@ sf.write("output.wav", audio[0].cpu().numpy(), 24000)
 ### Speaker Selection
-The model supports 12 speakers (IDs 0–11). Speaker 7 produces the most intelligible output (0.42% CER in-domain). Per-speaker quality varies:
 | Speaker | CER | Recommended |
 |---------|-----|-------------|
 | 7 | 0.0042 | Best overall |
 | 3 | 0.0174 | Best speaker similarity |
-| 8 | 0.0142 | Good |
-| 11 | 0.0129 | Good |
-| 14 | 0.0117 | Good |
 Use the speaker ID in the text prefix: `[7]your text here`.
@@ -135,12 +132,8 @@ for i, audio in enumerate(audios):
 - Trained on 12 speakers from Common Voice Georgian — limited speaker diversity
 - Long sentences (>10s of audio) may produce hallucinations or truncations
 - 4.1% of FLEURS samples had CER > 50% (failure cases on complex text)
-- No emotion or prosody control
 - Georgian only
-## Part of the Georgian TTS Benchmark
-This model was trained as part of the first Georgian TTS benchmark — a comparative study of 6 open-source TTS architectures. See the full project: [github.com/NMikaa/TTS_pipelines](https://github.com/NMikaa/TTS_pipelines)
 ## Citation

 # CSM-1B Georgian
+A fine-tuned version of [sesame/csm-1b](https://huggingface.co/sesame/csm-1b) for **Georgian text-to-speech**. This is open-source Georgian TTS model based on the CSM architecture.
 ## Model Details
 ### Speaker Selection
+The model supports 12 speakers (IDs ['1', '10', '11', '12', '14', '2', '3', '4', '5', '6', '7', '8'], sorry for this! going to fix in next models). Speaker 7 produces the most intelligible output (0.42% CER in-domain). Per-speaker quality varies:
 | Speaker | CER | Recommended |
 |---------|-----|-------------|
 | 7 | 0.0042 | Best overall |
 | 3 | 0.0174 | Best speaker similarity |
 Use the speaker ID in the text prefix: `[7]your text here`.
 - Trained on 12 speakers from Common Voice Georgian — limited speaker diversity
 - Long sentences (>10s of audio) may produce hallucinations or truncations
 - 4.1% of FLEURS samples had CER > 50% (failure cases on complex text)
 - Georgian only
 ## Citation