Update README.md
Browse files
README.md
CHANGED
|
@@ -18,10 +18,10 @@ base_model:
|
|
| 18 |
<p align="center">
|
| 19 |
<img src="https://voila.maitrix.org/static/images/logo.png" width="400"/><br/>
|
| 20 |
<b>Voila: <span style="color:#ca00f9">Voi</span>ce-<span style="color:#ca00f9">La</span>nguage Foundation Models</b><br/><br/>
|
| 21 |
-
💜 <a href="https://voila.maitrix.org"><b>
|
| 22 |
</p>
|
| 23 |
|
| 24 |
-
Voila is a
|
| 25 |
|
| 26 |
# ✨ Highlights
|
| 27 |
- ⭐ High-fidelity, low-latency, real-time streaming audio processing
|
|
@@ -135,7 +135,7 @@ If you find our work helpful, please cite us.
|
|
| 135 |
@article{voila2025,
|
| 136 |
author = {Yemin Shi, Yu Shu, Siwei Dong, Guangyi Liu, Jaward Sesay, Jingwen Li, Zhiting Hu},
|
| 137 |
title = {Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Roleplay},
|
| 138 |
-
eprint={},
|
| 139 |
archivePrefix={arXiv},
|
| 140 |
primaryClass={cs.CL},
|
| 141 |
year = {2025}
|
|
|
|
| 18 |
<p align="center">
|
| 19 |
<img src="https://voila.maitrix.org/static/images/logo.png" width="400"/><br/>
|
| 20 |
<b>Voila: <span style="color:#ca00f9">Voi</span>ce-<span style="color:#ca00f9">La</span>nguage Foundation Models</b><br/><br/>
|
| 21 |
+
💜 <a href="https://voila.maitrix.org"><b>Project Page</b></a>    |    🖥️ <a href="https://github.com/maitrix-org/Voila">GitHub</a>    |   🤗 <a href="https://huggingface.co/collections/maitrix-org/voila-67e0d96962c19f221fc73fa5">Hugging Face</a>   |    📑 <a href="http://arxiv.org/abs/2505.02707">Paper</a>    |    🌐 <a href="https://huggingface.co/spaces/maitrix-org/Voila-demo">Online Demo</a>   |    🏠<a href="https://maitrix.org">Maitrix.org</a>
|
| 22 |
</p>
|
| 23 |
|
| 24 |
+
Voila is a new family of large voice-language foundation models aiming to lift human-AI interaction experiences to the next level. Breaking away from the constraints of traditional voice AI systems—high latency, loss of vocal nuances, and mechanical responses—Voila employs an innovative end-to-end model design and a novel hierarchical Transformer architecture. This approach enables real-time, autonomous, and rich voice interactions, with latency as low as 195 ms, surpassing average human response times. Combining advanced voice and language modeling, Voila offers customizable, persona-driven engagements and excels in a range of audio tasks from ASR and TTS to speech translation across six languages. With the online [web demo](https://huggingface.co/spaces/maitrix-org/Voila-demo), Voila invites you to explore a transformative, natural dialogue experience between human and AI.
|
| 25 |
|
| 26 |
# ✨ Highlights
|
| 27 |
- ⭐ High-fidelity, low-latency, real-time streaming audio processing
|
|
|
|
| 135 |
@article{voila2025,
|
| 136 |
author = {Yemin Shi, Yu Shu, Siwei Dong, Guangyi Liu, Jaward Sesay, Jingwen Li, Zhiting Hu},
|
| 137 |
title = {Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Roleplay},
|
| 138 |
+
eprint={2505.02707},
|
| 139 |
archivePrefix={arXiv},
|
| 140 |
primaryClass={cs.CL},
|
| 141 |
year = {2025}
|