fdugyt commited on
Commit
a9555b0
·
verified ·
1 Parent(s): 94a9ba0

update readme

Browse files
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -423,17 +423,18 @@ with torch.no_grad():
423
 
424
 
425
 
426
- ### Generation Hyperparameters
427
-
428
- | Parameter | Type | Default | Description |
429
- |---|---|---:|---|
430
- | `max_new_tokens` | `int` | | Controls total generated audio tokens. Use duration rule: **1s ≈ 12.5 tokens**. |
431
- | `audio_temperature` | `float` | 1.7 | Higher values increase variation; lower values stabilize prosody. |
432
- | `audio_top_p` | `float` | 0.8 | Nucleus sampling cutoff. Lower values are more conservative. |
433
- | `audio_top_k` | `int` | 25 | Top-K sampling. Lower values tighten sampling space. |
434
- | `audio_repetition_penalty` | `float` | 1.0 | >1.0 discourages repeating patterns. |
435
-
436
- > Note: MOSS-TTS is a pretrained base model and is **sensitive to decoding hyperparameters**. See **Released Models** for recommended defaults.
 
437
 
438
 
439
 
 
423
 
424
 
425
 
426
+ ### Generation Hyperparameters (MOSS-TTS-Local)
427
+
428
+ MOSS-TTSLocal utilizes `DelayGenerationConfig` to manage hierarchical sampling. Due to the **Progressive Sequence Dropout** training mechanism, the model supports variable bitrate inference by adjusting the RVQ depth.
429
+
430
+ | Parameter | Type | Recommended (Audio Layers) | Description |
431
+ | :--- | :--- | :---: | :--- |
432
+ | `max_new_tokens` | `int` | | Controls total generated audio tokens. **1s 12.5 tokens**. |
433
+ | `n_vq_for_inference` | `int` | 32 | **RVQ Inference Depth**: Controls the number of codebook layers generated. Higher values (max 32) improve audio fidelity but slow down inference; lower values speed up inference but reduce audio quality. |
434
+ | `audio_temperature` | `float` | 1.0 | Temperature for audio token layers (Layer 1+). Lower values ensure more stable and consistent acoustic reconstruction. |
435
+ | `audio_top_p` | `float` | 0.95 | Nucleus sampling cutoff for audio layers. |
436
+ | `audio_top_k` | `int` | 50 | Top-K sampling filter for audio layers. |
437
+ | `audio_repetition_penalty` | `float` | 1.1 | Discourages repeating acoustic patterns. Values > 1.0 help prevent artifacts in long-form synthesis. |
438
 
439
 
440