Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
he-shuwei
/
M2SE-VTTS
like
1
Text-to-Speech
English
visual-tts
speech-synthesis
diffusion
spatial-audio
arxiv:
2412.11409
License:
mit
Model card
Files
Files and versions
xet
Community
8960bc5
M2SE-VTTS
10 GB
Ctrl+K
Ctrl+K
1 contributor
History:
29 commits
he-shuwei
Add architecture figure
8960bc5
verified
about 1 month ago
assets
Add architecture figure
about 1 month ago
bigvgan
Upload bigvgan/g_00076000 with huggingface_hub
about 1 month ago
data
Add MFA alignment results (tar.gz)
about 1 month ago
m2se_vtts
Fix diff_decoder_type: remove F5 naming
about 1 month ago
.gitattributes
Safe
1.71 kB
Add architecture figure
about 1 month ago
README.md
4.12 kB
Add model card
about 1 month ago