Improve model card: add project page, tags, and detailed description (#1)

Browse files

- Improve model card: add project page, tags, and detailed description (595e6d93c84186057661095c541d4ec107070fc6)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +34 -22

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 ---
-license: cc-by-nc-4.0
 language:
 - it
 - pt
@@ -11,37 +13,47 @@ language:
 - ru
 - en
 - zh
-task_categories:
-- text-to-speech
-datasets:
-- LEMAS-Project/LEMAS-Dataset-train
-- LEMAS-Project/LEMAS-Dataset-eval
 pipeline_tag: text-to-speech
 ---
-## Overview
-LEMAS‑TTS is a multilingual zero‑shot text‑to‑speech system, supporting 10 languages:
-- Chinese
-- English
-- Spanish
-- Russian
-- French
-- German
-- Italian
-- Portuguese
-- Indonesian
-- Vietnamese
-You can try the model via our Hugging Face demo: [https://huggingface.co/spaces/LEMAS-Project/LEMAS-TTS](https://huggingface.co/spaces/LEMAS-Project/LEMAS-TTS)
-For more details, please visit our GitHub repository: ([https://github.com/LEMAS-Project/LEMAS-TTS](https://github.com/LEMAS-Project/LEMAS-TTS))
 ## Citation
-[https://arxiv.org/abs/2601.04233](https://arxiv.org/abs/2601.04233)
-```
 @article{zhao2026lemas,
   title={LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models},
   author={Zhao, Zhiyuan and Lin, Lijian and Zhu, Ye and Xie, Kai and Liu, Yunfei and Li, Yu},

 ---
+datasets:
+- LEMAS-Project/LEMAS-Dataset-train
+- LEMAS-Project/LEMAS-Dataset-eval
 language:
 - it
 - pt
 - ru
 - en
 - zh
+license: cc-by-nc-4.0
 pipeline_tag: text-to-speech
+tags:
+- zero-shot
+- multilingual
 ---
+# LEMAS-TTS
+LEMAS-TTS is a multilingual zero-shot text-to-speech system, presented in the paper [LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models](https://huggingface.co/papers/2601.04233).
+- **Project Page:** [https://lemas-project.github.io/LEMAS-Project](https://lemas-project.github.io/LEMAS-Project)
+- **Paper:** [https://arxiv.org/abs/2601.04233](https://arxiv.org/abs/2601.04233)
+- **GitHub Repository:** [https://github.com/LEMAS-Project/LEMAS-TTS](https://github.com/LEMAS-Project/LEMAS-TTS)
+- **Hugging Face Demo:** [https://huggingface.co/spaces/LEMAS-Project/LEMAS-TTS](https://huggingface.co/spaces/LEMAS-Project/LEMAS-TTS)
+## Model Description
+LEMAS-TTS is built upon a non-autoregressive flow-matching framework. It leverages the massive scale and linguistic diversity of the LEMAS-Dataset to achieve robust zero-shot multilingual synthesis. The model incorporates accent-adversarial training and CTC loss to mitigate cross-lingual accent issues, enhancing synthesis stability and quality across diverse languages.
+## Supported Languages
+The model supports 10 major languages for zero-shot synthesis:
+- Chinese (zh)
+- English (en)
+- Spanish (es)
+- Russian (ru)
+- French (fr)
+- German (de)
+- Italian (it)
+- Portuguese (pt)
+- Indonesian (id)
+- Vietnamese (vi)
+## Training Data
+LEMAS-TTS was trained on the [LEMAS-Dataset](https://huggingface.co/datasets/LEMAS-Project/LEMAS-Dataset-train), which is, to our knowledge, currently the largest open-source multilingual speech corpus with word-level timestamps. It covers over 150,000 hours across 10 major languages.
 ## Citation
+```bibtex
 @article{zhao2026lemas,
   title={LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models},
   author={Zhao, Zhiyuan and Lin, Lijian and Zhu, Ye and Xie, Kai and Liu, Yunfei and Li, Yu},