Video-Text-to-Text
Transformers
English
video
video-question-answering
multimodal
vision-language
qwen3-vl
inference-time
frame-selection
clip
Instructions to use commandeaw/DW-KhotTaeVL-2B-QueryFrames with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use commandeaw/DW-KhotTaeVL-2B-QueryFrames with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("commandeaw/DW-KhotTaeVL-2B-QueryFrames", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 1,108 Bytes
84c8a9d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | DW-KhotTaeVL-2B-QueryFrames
============================
Copyright 2026 Deaw (HF: @commandeaw)
This product is released by Deaw under the Apache License,
Version 2.0. Personal research project, not affiliated with any
commercial entity.
----
This product builds on the following third-party components:
1. Qwen3-VL-2B-Instruct
Copyright Alibaba Cloud / Qwen Team
Licensed under the Apache License, Version 2.0
https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
Per the Apache 2.0 license, the base model weights are reused
without modification by this derivative. Always credit the base
model when using DW-KhotTaeVL-2B-QueryFrames.
2. CLIP-ViT-Large-Patch14
Copyright OpenAI
Licensed under the MIT License
https://huggingface.co/openai/clip-vit-large-patch14
Used as a query-aware frame scorer.
3. Video-MME (evaluation only — not redistributed)
Copyright the original authors (Fu et al. 2024)
See: https://huggingface.co/datasets/lmms-lab/Video-MME
----
NO WARRANTY
This software is provided "AS IS" without warranty of any kind.
See LICENSE for full terms.
|