EurekaTian
/

ROMA

Video-Text-to-Text

video-understanding

audio-understanding

Model card Files Files and versions

EurekaTian commited on Jan 16

Commit

9756644

·

verified ·

1 Parent(s): 6cdb469

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,6 +24,6 @@ pipeline_tag: video-text-to-text
 ROMA introduces a "Speak Head" mechanism to decouple response timing from content generation, allowing it to autonomously decide *when* to speak based on the continuous audio-visual stream.
-- **Paper:** [ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding](https://arxiv.org/abs/250x.xxxxx)
 - **Project Page:** [Link](https://eureka-maggie.github.io/ROMA_show/)
 - **Repository:** [[Github (Coming Soon)](https://github.com/Eureka-Maggie/ROMA)]

 ROMA introduces a "Speak Head" mechanism to decouple response timing from content generation, allowing it to autonomously decide *when* to speak based on the continuous audio-visual stream.
+- **Paper:** [ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding](https://arxiv.org/abs/2601.10323)
 - **Project Page:** [Link](https://eureka-maggie.github.io/ROMA_show/)
 - **Repository:** [[Github (Coming Soon)](https://github.com/Eureka-Maggie/ROMA)]