LEMAS-Project
/

LEMAS-Edit

Model card Files Files and versions

Approximetal commited on Jan 10

Commit

21ef83d

·

verified ·

1 Parent(s): 7b632eb

Create README.md

Files changed (1) hide show

README.md +53 -0

README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+---
+datasets:
+- LEMAS-Project/LEMAS-Dataset-train
+- LEMAS-Project/LEMAS-Dataset-eval
+language:
+- it
+- pt
+- es
+- fr
+- de
+- en
+- zh
+license: cc-by-nc-4.0
+pipeline_tag: text-to-speech
+tags:
+- zero-shot
+- multilingual
+---
+# LEMAS-Edit
+LEMAS-Edit is a multilingual zero-shot speech editing system, presented in the paper [LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models](https://huggingface.co/papers/2601.04233).
+- **Project Page:** [https://lemas-project.github.io/LEMAS-Project](https://lemas-project.github.io/LEMAS-Project)
+- **Paper:** [https://arxiv.org/abs/2601.04233](https://arxiv.org/abs/2601.04233)
+- **GitHub Repository:** [https://github.com/LEMAS-Project/LEMAS-Edit](https://github.com/LEMAS-Project/LEMAS-Edit)
+- **Hugging Face Demo:** [https://huggingface.co/spaces/LEMAS-Project/LEMAS-Edit](https://huggingface.co/spaces/LEMAS-Project/LEMAS-Edit)
+## Supported Languages
+The model supports 7 major languages for zero-shot synthesis:
+- Chinese (zh)
+- English (en)
+- Spanish (es)
+- French (fr)
+- German (de)
+- Italian (it)
+- Portuguese (pt)
+## Training Data
+LEMAS-Edit was trained on the subset of [LEMAS-Dataset](https://huggingface.co/datasets/LEMAS-Project/LEMAS-Dataset-train), which is, to our knowledge, currently the largest open-source multilingual speech corpus with word-level timestamps. It covers over 150,000 hours across 10 major languages.
+## Citation
+```bibtex
+@article{zhao2026lemas,
+  title={LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models},
+  author={Zhao, Zhiyuan and Lin, Lijian and Zhu, Ye and Xie, Kai and Liu, Yunfei and Li, Yu},
+  journal={arXiv preprint arXiv:2601.04233},
+  year={2026}
+}
+```