Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
6.3.0
metadata
title: FoundationMotion
emoji: π
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
arxiv: '2512.10927'
Video β Q&A (Qwen2.5-VL-7B WolfV2)
This Space lets you drag-and-drop a video and ask questions about it using Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned (a fine-tuned Qwen2.5-VL-7B-Instruct).
Deploy
- Create a new Hugging Face Space (Python + Gradio).
- Add the three files from this repo:
app.py,requirements.txt,README.md. - (Optional) In the Space Settings β Variables, set
MODEL_ID=Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned(default already). - (Optional) If your GPU VRAM is tight, set env var
USE_INT4=1to enable 4-bit weight-only quantization.
GPU recommended. A10/A100 or ZeroGPU works for short videos; longer/high-res videos may OOM on CPU.
How it works
- We construct a chat-style prompt with a video item and your question, then call
processor.apply_chat_template(..., fps=1)andmodel.generate(...). - You can increase
fpsfor more temporal detail. Higher fps β more tokens/VRAM. - Resolution bounds are controlled via
min_pixels/max_pixelsonAutoProcessor.
Tips
- If you see
KeyError: 'qwen2_5_vl', your Transformers is too old β upgrade to>=4.50.0. - If decoding fails for certain containers, try converting to
.mp4(H.264 + AAC). - To return 5 QA pairs automatically, leave the question blank β the app uses a default instruction to summarize and produce 5 QAs.
Acknowledgments
- Model: Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned
- Base architecture & usage patterns: Qwen/Qwen2.5-VL-7B-Instruct via π€ Transformers.