update readme
Browse files
README.md
CHANGED
|
@@ -423,17 +423,18 @@ with torch.no_grad():
|
|
| 423 |
|
| 424 |
|
| 425 |
|
| 426 |
-
### Generation Hyperparameters
|
| 427 |
-
|
| 428 |
-
|
| 429 |
-
|
| 430 |
-
|
|
| 431 |
-
|
|
| 432 |
-
| `
|
| 433 |
-
| `
|
| 434 |
-
| `
|
| 435 |
-
|
| 436 |
-
|
|
|
|
| 437 |
|
| 438 |
|
| 439 |
|
|
|
|
| 423 |
|
| 424 |
|
| 425 |
|
| 426 |
+
### Generation Hyperparameters (MOSS-TTS-Local)
|
| 427 |
+
|
| 428 |
+
MOSS-TTSLocal utilizes `DelayGenerationConfig` to manage hierarchical sampling. Due to the **Progressive Sequence Dropout** training mechanism, the model supports variable bitrate inference by adjusting the RVQ depth.
|
| 429 |
+
|
| 430 |
+
| Parameter | Type | Recommended (Audio Layers) | Description |
|
| 431 |
+
| :--- | :--- | :---: | :--- |
|
| 432 |
+
| `max_new_tokens` | `int` | — | Controls total generated audio tokens. **1s ≈ 12.5 tokens**. |
|
| 433 |
+
| `n_vq_for_inference` | `int` | 32 | **RVQ Inference Depth**: Controls the number of codebook layers generated. Higher values (max 32) improve audio fidelity but slow down inference; lower values speed up inference but reduce audio quality. |
|
| 434 |
+
| `audio_temperature` | `float` | 1.0 | Temperature for audio token layers (Layer 1+). Lower values ensure more stable and consistent acoustic reconstruction. |
|
| 435 |
+
| `audio_top_p` | `float` | 0.95 | Nucleus sampling cutoff for audio layers. |
|
| 436 |
+
| `audio_top_k` | `int` | 50 | Top-K sampling filter for audio layers. |
|
| 437 |
+
| `audio_repetition_penalty` | `float` | 1.1 | Discourages repeating acoustic patterns. Values > 1.0 help prevent artifacts in long-form synthesis. |
|
| 438 |
|
| 439 |
|
| 440 |
|