Video-Text-to-Text
Transformers
English
video
video-question-answering
multimodal
vision-language
qwen3-vl
inference-time
frame-selection
clip
Instructions to use commandeaw/DW-KhotTaeVL-2B-QueryFrames with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use commandeaw/DW-KhotTaeVL-2B-QueryFrames with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("commandeaw/DW-KhotTaeVL-2B-QueryFrames", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| DW-KhotTaeVL-2B-QueryFrames | |
| ============================ | |
| Copyright 2026 Deaw (HF: @commandeaw) | |
| This product is released by Deaw under the Apache License, | |
| Version 2.0. Personal research project, not affiliated with any | |
| commercial entity. | |
| ---- | |
| This product builds on the following third-party components: | |
| 1. Qwen3-VL-2B-Instruct | |
| Copyright Alibaba Cloud / Qwen Team | |
| Licensed under the Apache License, Version 2.0 | |
| https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct | |
| Per the Apache 2.0 license, the base model weights are reused | |
| without modification by this derivative. Always credit the base | |
| model when using DW-KhotTaeVL-2B-QueryFrames. | |
| 2. CLIP-ViT-Large-Patch14 | |
| Copyright OpenAI | |
| Licensed under the MIT License | |
| https://huggingface.co/openai/clip-vit-large-patch14 | |
| Used as a query-aware frame scorer. | |
| 3. Video-MME (evaluation only — not redistributed) | |
| Copyright the original authors (Fu et al. 2024) | |
| See: https://huggingface.co/datasets/lmms-lab/Video-MME | |
| ---- | |
| NO WARRANTY | |
| This software is provided "AS IS" without warranty of any kind. | |
| See LICENSE for full terms. | |