Update README.md
Browse files
README.md
CHANGED
|
@@ -96,6 +96,8 @@ This repo contains the following configurations under `./models/`:
|
|
| 96 |
* Speedups are immense compared to the `ar+nar-llama-8`, as the entire audio output is decoded in parallel rather than causally.
|
| 97 |
* Throughput and memory usage should be constant between inferencing steps.
|
| 98 |
* The model only needs to be invoked about 5+25+7 (duration inferencing + RVQ level 0 inferencing + remaining RVQ levels) instead.
|
|
|
|
|
|
|
| 99 |
* Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
|
| 100 |
* ...except STT, this received no STT training out of fear of botching the model.
|
| 101 |
* Weights will be added as the model is trained.
|
|
|
|
| 96 |
* Speedups are immense compared to the `ar+nar-llama-8`, as the entire audio output is decoded in parallel rather than causally.
|
| 97 |
* Throughput and memory usage should be constant between inferencing steps.
|
| 98 |
* The model only needs to be invoked about 5+25+7 (duration inferencing + RVQ level 0 inferencing + remaining RVQ levels) instead.
|
| 99 |
+
* Seems to absolutely require classifier-free-guidance to keep the output stable.
|
| 100 |
+
* The "confidence" issue on voices it hasn't seen / hasn't seen much of is much more noticeable as RVQ level 0 is much more susceptable to it.
|
| 101 |
* Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
|
| 102 |
* ...except STT, this received no STT training out of fear of botching the model.
|
| 103 |
* Weights will be added as the model is trained.
|