Pendrokar
/

xvapitch_nvidia

Model card Files Files and versions

Pendrokar commited on Feb 20, 2024

Commit

774cb85

·

verified ·

1 Parent(s): 61b10e6

languages; origins; papers

Files changed (1) hide show

README.md +48 -3

README.md CHANGED Viewed

@@ -2,11 +2,56 @@
 license: cc-by-4.0
 language:
 - en
 pipeline_tag: text-to-speech
 ---
-xVASynth's xVAPitch (v3) type of voice model.
-Legal note: While model is trained on a CC dataset, xVATrainer pretrained models used to train this model include non-CC datasets.
-NVIDIA HIFI 6670 M

 license: cc-by-4.0
 language:
 - en
+- de
+- es
+- it
+- nl
+- pt
+- pl
+- ro
+- sv
+- da
+- fi
+- hu
+- el
+- fr
+- ru
+- uk
+- tr
+- ar
+- hi
+- jp
+- ko
+- zh
+- vi
+- la
+- ha
+- sw
+- yo
+- wo
+thumbnail: >-
+  https://raw.githubusercontent.com/DanRuta/xVA-Synth/master/assets/x-icon.png
+library: xvasynth
+tags:
+  - emotion
+  - audio
+  - text-to-speech
+  - tts
 pipeline_tag: text-to-speech
 ---
+xVASynth's xVAPitch (v3) type of voice models based on NVIDIA HIFI NeMo datasets created.
+Models created by Dan Ruta, origin link:
+- https://www.nexusmods.com/skyrimspecialedition/mods/65022?tab=files
+Dataset supposed origin:
+- https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/core.html
+xVAPitch model referenced Papers:
+- Multi-head attention with Relative Positional embedding - https://arxiv.org/pdf/1809.04281.pdf
+- Transformer with Relative Potional Encoding- https://arxiv.org/abs/1803.02155
+- SDP - https://arxiv.org/pdf/2106.06103.pdf
+- Spline Flow - https://arxiv.org/abs/1906.04032
+Legal note: Although these datasets are licensed as CC BY 4.0, the base v3 model that these are fine-tuned from, was pre-trained on non-permissive data.