Spaces:

yulu2
/

FoundationMotion

Running on Zero

App Files Files Community

FoundationMotion / README.md

yulu2

Update README.md

e0ba5c9 verified 25 days ago

preview code

raw

history blame contribute delete

1.7 kB

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

metadata

title: FoundationMotion
emoji: 🌍
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
arxiv: '2512.10927'

Video → Q&A (Qwen2.5-VL-7B WolfV2)

This Space lets you drag-and-drop a video and ask questions about it using Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned (a fine-tuned Qwen2.5-VL-7B-Instruct).

Deploy

Create a new Hugging Face Space (Python + Gradio).
Add the three files from this repo: app.py, requirements.txt, README.md.
(Optional) In the Space Settings → Variables, set MODEL_ID=Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned (default already).
(Optional) If your GPU VRAM is tight, set env var USE_INT4=1 to enable 4-bit weight-only quantization.

GPU recommended. A10/A100 or ZeroGPU works for short videos; longer/high-res videos may OOM on CPU.

How it works

We construct a chat-style prompt with a video item and your question, then call processor.apply_chat_template(..., fps=1) and model.generate(...).
You can increase fps for more temporal detail. Higher fps → more tokens/VRAM.
Resolution bounds are controlled via min_pixels/max_pixels on AutoProcessor.

Tips

If you see KeyError: 'qwen2_5_vl', your Transformers is too old — upgrade to >=4.50.0.
If decoding fails for certain containers, try converting to .mp4 (H.264 + AAC).
To return 5 QA pairs automatically, leave the question blank — the app uses a default instruction to summarize and produce 5 QAs.

Acknowledgments

Model: Efficient-Large-Model/qwen2_5vl-7b-wolfv2-tuned
Base architecture & usage patterns: Qwen/Qwen2.5-VL-7B-Instruct via 🤗 Transformers.