Spaces:
Running
on
Zero
Running
on
Zero
| title: FoundationMotion | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| arxiv: "2512.10927" | |
| # Video β Q&A (Qwen2.5-VL-7B WolfV2) | |
| This Space lets you drag-and-drop a video and ask questions about it using **Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned** (a fine-tuned Qwen2.5-VL-7B-Instruct). | |
| ## Deploy | |
| 1. Create a new Hugging Face Space (Python + Gradio). | |
| 2. Add the three files from this repo: `app.py`, `requirements.txt`, `README.md`. | |
| 3. (Optional) In the Space **Settings β Variables**, set `MODEL_ID=Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned` (default already). | |
| 4. (Optional) If your GPU VRAM is tight, set env var `USE_INT4=1` to enable 4-bit weight-only quantization. | |
| > **GPU recommended.** A10/A100 or ZeroGPU works for short videos; longer/high-res videos may OOM on CPU. | |
| ## How it works | |
| - We construct a chat-style prompt with a video item and your question, then call `processor.apply_chat_template(..., fps=1)` and `model.generate(...)`. | |
| - You can increase `fps` for more temporal detail. Higher fps β more tokens/VRAM. | |
| - Resolution bounds are controlled via `min_pixels`/`max_pixels` on `AutoProcessor`. | |
| ## Tips | |
| - If you see `KeyError: 'qwen2_5_vl'`, your Transformers is too old β upgrade to `>=4.50.0`. | |
| - If decoding fails for certain containers, try converting to `.mp4` (H.264 + AAC). | |
| - To return **5 QA pairs automatically**, leave the question blank β the app uses a default instruction to summarize and produce 5 QAs. | |
| ## Acknowledgments | |
| - Model: Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned | |
| - Base architecture & usage patterns: Qwen/Qwen2.5-VL-7B-Instruct via π€ Transformers. | |