zhitinghu commited on
Commit
98dc1a8
·
verified ·
1 Parent(s): 4861243

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -18,10 +18,10 @@ base_model:
18
  <p align="center">
19
  <img src="https://voila.maitrix.org/static/images/logo.png" width="400"/><br/>
20
  <b>Voila: <span style="color:#ca00f9">Voi</span>ce-<span style="color:#ca00f9">La</span>nguage Foundation Models</b><br/><br/>
21
- 💜 <a href="https://voila.maitrix.org"><b>Voila</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/maitrix-org/Voila">GitHub</a> &nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/collections/maitrix-org/voila-67e0d96962c19f221fc73fa5">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper (Coming soon)</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://huggingface.co/spaces/maitrix-org/Voila-demo">Demo</a>
22
  </p>
23
 
24
- Voila is a groundbreaking family of large audio-language foundation models that revolutionizes human-AI interactions. Breaking away from the constraints of traditional voice AI systems—high latency, loss of vocal nuances, and mechanical responses, Voila employs an innovative end-to-end model design and a novel hierarchical Transformer architecture. This approach enables real-time, autonomous, and rich voice interactions, with latency as low as 195 ms, surpassing average human response times. Combining advanced voice and language modeling, Voila offers customizable, persona-driven engagements and excels in a range of audio tasks from ASR and TTS to speech translation across six languages. With the online [web demo](https://huggingface.co/spaces/maitrix-org/Voila-demo), Voila invites you to explore a transformative, natural dialogue experience between human and AI.
25
 
26
  # ✨ Highlights
27
  - ⭐ High-fidelity, low-latency, real-time streaming audio processing
@@ -135,7 +135,7 @@ If you find our work helpful, please cite us.
135
  @article{voila2025,
136
  author = {Yemin Shi, Yu Shu, Siwei Dong, Guangyi Liu, Jaward Sesay, Jingwen Li, Zhiting Hu},
137
  title = {Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Roleplay},
138
- eprint={},
139
  archivePrefix={arXiv},
140
  primaryClass={cs.CL},
141
  year = {2025}
 
18
  <p align="center">
19
  <img src="https://voila.maitrix.org/static/images/logo.png" width="400"/><br/>
20
  <b>Voila: <span style="color:#ca00f9">Voi</span>ce-<span style="color:#ca00f9">La</span>nguage Foundation Models</b><br/><br/>
21
+ 💜 <a href="https://voila.maitrix.org"><b>Project Page</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/maitrix-org/Voila">GitHub</a> &nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/collections/maitrix-org/voila-67e0d96962c19f221fc73fa5">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="http://arxiv.org/abs/2505.02707">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://huggingface.co/spaces/maitrix-org/Voila-demo">Online Demo</a> &nbsp&nbsp| &nbsp&nbsp 🏠<a href="https://maitrix.org">Maitrix.org</a>
22
  </p>
23
 
24
+ Voila is a new family of large voice-language foundation models aiming to lift human-AI interaction experiences to the next level. Breaking away from the constraints of traditional voice AI systems—high latency, loss of vocal nuances, and mechanical responsesVoila employs an innovative end-to-end model design and a novel hierarchical Transformer architecture. This approach enables real-time, autonomous, and rich voice interactions, with latency as low as 195 ms, surpassing average human response times. Combining advanced voice and language modeling, Voila offers customizable, persona-driven engagements and excels in a range of audio tasks from ASR and TTS to speech translation across six languages. With the online [web demo](https://huggingface.co/spaces/maitrix-org/Voila-demo), Voila invites you to explore a transformative, natural dialogue experience between human and AI.
25
 
26
  # ✨ Highlights
27
  - ⭐ High-fidelity, low-latency, real-time streaming audio processing
 
135
  @article{voila2025,
136
  author = {Yemin Shi, Yu Shu, Siwei Dong, Guangyi Liu, Jaward Sesay, Jingwen Li, Zhiting Hu},
137
  title = {Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Roleplay},
138
+ eprint={2505.02707},
139
  archivePrefix={arXiv},
140
  primaryClass={cs.CL},
141
  year = {2025}