Improve model card metadata
Browse filesThis PR improves the model card by correcting the `pipeline_tag` to `audio-text-to-text` and adding the `library_name` to better categorize the model on the Hugging Face Hub. This ensures the model is discoverable through relevant searches and improves its metadata.
README.md
CHANGED
|
@@ -1,13 +1,15 @@
|
|
| 1 |
---
|
| 2 |
license: gpl-3.0
|
| 3 |
-
|
|
|
|
| 4 |
tags:
|
| 5 |
- omni
|
| 6 |
---
|
|
|
|
| 7 |
# Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
|
| 8 |
|
| 9 |
[](https://arxiv.org/abs/2506.13642)
|
| 10 |
-
[](https://huggingface.co/ICTNLP/stream-omni-8b)
|
| 12 |
[](https://huggingface.co/datasets/ICTNLP/InstructOmni)
|
| 13 |
[](https://github.com/ictnlp/Stream-Omni)
|
|
@@ -33,4 +35,4 @@ Stream-Omni is an end-to-end language-vision-speech chatbot that simultaneously
|
|
| 33 |
|
| 34 |
> [!NOTE]
|
| 35 |
>
|
| 36 |
-
> **Stream-Omni can produce intermediate textual results (ASR transcription and text response) during speech interaction, offering users a seamless "see-while-hear" experience.**
|
|
|
|
| 1 |
---
|
| 2 |
license: gpl-3.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: audio-text-to-text
|
| 5 |
tags:
|
| 6 |
- omni
|
| 7 |
---
|
| 8 |
+
|
| 9 |
# Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
|
| 10 |
|
| 11 |
[](https://arxiv.org/abs/2506.13642)
|
| 12 |
+
[](https://github.com/ictnlp/Stream-Omni)
|
| 13 |
[](https://huggingface.co/ICTNLP/stream-omni-8b)
|
| 14 |
[](https://huggingface.co/datasets/ICTNLP/InstructOmni)
|
| 15 |
[](https://github.com/ictnlp/Stream-Omni)
|
|
|
|
| 35 |
|
| 36 |
> [!NOTE]
|
| 37 |
>
|
| 38 |
+
> **Stream-Omni can produce intermediate textual results (ASR transcription and text response) during speech interaction, offering users a seamless "see-while-hear" experience.**
|