Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
EurekaTian
/
ROMA
like
0
Video-Text-to-Text
Transformers
Safetensors
qwen2_5_omni
multimodal
video-understanding
audio-understanding
streaming
real-time
omni-modal
arxiv:
2601.10323
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
ROMA
22.4 GB
1 contributor
History:
7 commits
EurekaTian
Update README.md
9756644
verified
1 day ago
.gitattributes
1.62 kB
Upload architecture.png
2 days ago
README.md
1.29 kB
Update README.md
1 day ago
added_tokens.json
579 Bytes
Upload folder using huggingface_hub
2 days ago
architecture.png
497 kB
xet
Upload architecture.png
2 days ago
chat_template.json
1.31 kB
Upload folder using huggingface_hub
2 days ago
config.json
15.2 kB
Upload folder using huggingface_hub
2 days ago
generation_config.json
74 Bytes
Upload folder using huggingface_hub
2 days ago
merges.txt
1.67 MB
Upload folder using huggingface_hub
2 days ago
model-00001-of-00003.safetensors
9.98 GB
xet
Upload folder using huggingface_hub
2 days ago
model-00002-of-00003.safetensors
9.97 GB
xet
Upload folder using huggingface_hub
2 days ago
model-00003-of-00003.safetensors
2.43 GB
xet
Upload folder using huggingface_hub
2 days ago
model.safetensors.index.json
234 kB
Upload folder using huggingface_hub
2 days ago
preprocessor_config.json
667 Bytes
Upload folder using huggingface_hub
2 days ago
special_tokens_map.json
833 Bytes
Upload folder using huggingface_hub
2 days ago
spk_dict.pt
260 kB
xet
Upload folder using huggingface_hub
2 days ago
tokenizer.json
11.4 MB
xet
Upload folder using huggingface_hub
2 days ago
tokenizer_config.json
6.47 kB
Upload folder using huggingface_hub
2 days ago
vocab.json
2.78 MB
Upload folder using huggingface_hub
2 days ago