ecker commited on
Commit
6a76916
·
1 Parent(s): be6c4d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -96,6 +96,8 @@ This repo contains the following configurations under `./models/`:
96
  * Speedups are immense compared to the `ar+nar-llama-8`, as the entire audio output is decoded in parallel rather than causally.
97
  * Throughput and memory usage should be constant between inferencing steps.
98
  * The model only needs to be invoked about 5+25+7 (duration inferencing + RVQ level 0 inferencing + remaining RVQ levels) instead.
 
 
99
  * Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
100
  * ...except STT, this received no STT training out of fear of botching the model.
101
  * Weights will be added as the model is trained.
 
96
  * Speedups are immense compared to the `ar+nar-llama-8`, as the entire audio output is decoded in parallel rather than causally.
97
  * Throughput and memory usage should be constant between inferencing steps.
98
  * The model only needs to be invoked about 5+25+7 (duration inferencing + RVQ level 0 inferencing + remaining RVQ levels) instead.
99
+ * Seems to absolutely require classifier-free-guidance to keep the output stable.
100
+ * The "confidence" issue on voices it hasn't seen / hasn't seen much of is much more noticeable as RVQ level 0 is much more susceptable to it.
101
  * Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
102
  * ...except STT, this received no STT training out of fear of botching the model.
103
  * Weights will be added as the model is trained.