Improve model card: update `pipeline_tag` and add `library_name`

This PR improves the model card for the StreamingChat model by:
- Updating the `pipeline_tag` from `visual-question-answering` to `video-text-to-text` for a more accurate categorization of the model's functionality in streaming video understanding and multi-turn dialogues.
- Adding `library_name: transformers` as evidence from the GitHub README (`pip install transformers`) and `config.json` confirms compatibility with the Hugging Face Transformers library, enabling the automated "how to use" widget.
- Correcting the introductory sentence from "This dataset card" to "This model card" for improved accuracy.

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -1,21 +1,22 @@
 ---
-license: apache-2.0
 datasets:
 - yzy666/SVBench
 language:
 - en
 metrics:
 - code_eval
-base_model:
-- OpenGVLab/InternVideo2_5_Chat_8B
-pipeline_tag: visual-question-answering
 ---
 # Model Card for StreamingChat
 <!-- Provide a quick summary of what the model is/does. -->
-This dataset card aims to provide a comprehensive overview of the StreamingChat model. For details, see our [Project](https://yzy-bupt.github.io/SVBench/), [Paper](https://arxiv.org/abs/2502.10810), [Dataset](https://huggingface.co/datasets/yzy666/SVBench) and [GitHub repository](https://github.com/yzy-bupt/SVBench).
 ## **Dataset Description**
 **StreamingChat** is a streaming video understanding model built upon [InternVideo2.5](https://huggingface.co/OpenGVLab/InternVideo2_5_Chat_8B). It utilizes Streaming video dialogue data, including temporal dialogue paths from the [SVBench](https://huggingface.co/datasets/yzy666/SVBench) training set. The model is fine-tuned using a static resolution strategy, enabling it to process several minutes of video at a rate of 1 FPS. Images are interleaved with language tokens, with each image comprising 16 tokens. This model aims to catalyze progress in streaming video understanding.

 ---
+base_model:
+- OpenGVLab/InternVideo2_5_Chat_8B
 datasets:
 - yzy666/SVBench
 language:
 - en
+license: apache-2.0
 metrics:
 - code_eval
+pipeline_tag: video-text-to-text
+library_name: transformers
 ---
 # Model Card for StreamingChat
 <!-- Provide a quick summary of what the model is/does. -->
+This model card aims to provide a comprehensive overview of the StreamingChat model. For details, see our [Project](https://yzy-bupt.github.io/SVBench/), [Paper](https://arxiv.org/abs/2502.10810), [Dataset](https://huggingface.co/datasets/yzy666/SVBench) and [GitHub repository](https://github.com/yzy-bupt/SVBench).
 ## **Dataset Description**
 **StreamingChat** is a streaming video understanding model built upon [InternVideo2.5](https://huggingface.co/OpenGVLab/InternVideo2_5_Chat_8B). It utilizes Streaming video dialogue data, including temporal dialogue paths from the [SVBench](https://huggingface.co/datasets/yzy666/SVBench) training set. The model is fine-tuned using a static resolution strategy, enabling it to process several minutes of video at a rate of 1 FPS. Images are interleaved with language tokens, with each image comprising 16 tokens. This model aims to catalyze progress in streaming video understanding.