Update README.md
Browse files
README.md
CHANGED
|
@@ -24,6 +24,6 @@ pipeline_tag: video-text-to-text
|
|
| 24 |
|
| 25 |
ROMA introduces a "Speak Head" mechanism to decouple response timing from content generation, allowing it to autonomously decide *when* to speak based on the continuous audio-visual stream.
|
| 26 |
|
| 27 |
-
- **Paper:** [ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding](https://arxiv.org/abs/
|
| 28 |
- **Project Page:** [Link](https://eureka-maggie.github.io/ROMA_show/)
|
| 29 |
- **Repository:** [[Github (Coming Soon)](https://github.com/Eureka-Maggie/ROMA)]
|
|
|
|
| 24 |
|
| 25 |
ROMA introduces a "Speak Head" mechanism to decouple response timing from content generation, allowing it to autonomously decide *when* to speak based on the continuous audio-visual stream.
|
| 26 |
|
| 27 |
+
- **Paper:** [ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding](https://arxiv.org/abs/2601.10323)
|
| 28 |
- **Project Page:** [Link](https://eureka-maggie.github.io/ROMA_show/)
|
| 29 |
- **Repository:** [[Github (Coming Soon)](https://github.com/Eureka-Maggie/ROMA)]
|