Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,6 +7,16 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models
|
| 11 |
|
| 12 |
+
[LEMAS](https://lemas-project.github.io/LEMAS-Project/) is a large-scale extensible multilingual audio suite, providing the largest open-source multilingual speech corpus with word-level timestamps to our knowledge, covering over 150,000 hours across 10 major languages. Built with a rigorous alignment and confidence-based filtering pipeline, LEMAS supports diverse generative paradigms including zero-shot multilingual synthesis (LEMAS-TTS) and seamless speech editing (LEMAS-Edit).
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
<details><summary>Citation</summary>
|
| 16 |
+
|
| 17 |
+
@article{zhao2026lemas,
|
| 18 |
+
title={LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models},
|
| 19 |
+
author={Zhao, Zhiyuan and Lin, Lijian and Zhu, Ye and Xie, Kai and Liu, Yunfei and Li, Yu},
|
| 20 |
+
journal={arXiv preprint arXiv:2601.04233},
|
| 21 |
+
year={2026}
|
| 22 |
+
}
|