Video-Text-to-Text
Transformers
Safetensors
English
videochat_flash_qwen
feature-extraction
multimodal
custom_code
Eval Results (legacy)
Instructions to use OpenGVLab/VideoChat-Flash-Qwen2_5-2B_res448 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenGVLab/VideoChat-Flash-Qwen2_5-2B_res448 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenGVLab/VideoChat-Flash-Qwen2_5-2B_res448", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Describing short clips
#3
by fishcakeday - opened
When I try to ask the model to describe a short clip with very few frames, it always fails to identify any actions or movements, only talking about the overall description. Trying it with and without do_sample makes no difference. Any way I can use this setup to describe 2-5 second clips?
- Have you tried the 7B model?
- What's your prompt?
- Is this true for all short videos, or is it just a case? You can send corresponding videos to my email address xinhaoli00@outlook.com