Video-Text-to-Text
Transformers
Safetensors
English
qwen3_5
text-generation
video
multimodal
video-captioning
temporal-grounding
qwen
VLM
custom_code
Instructions to use cudabenchmarktest/video-scan with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cudabenchmarktest/video-scan with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("cudabenchmarktest/video-scan", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("cudabenchmarktest/video-scan", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
File size: 675 Bytes
f0ab8f1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | Marlin 2B
Copyright (c) 2026 NemoStation
This product includes weights derived from Qwen3.5-2B
(https://huggingface.co/Qwen/Qwen3.5-2B), Copyright (c) 2025 Alibaba Cloud,
used under the Apache License, Version 2.0 (see LICENSE-QWEN-BASE).
Modifications by NemoStation include: integration of a video-capable
visual tower, custom training data curation (~400K clip-level annotations
with Gemini-3-Flash teacher distillation), two-stage SFT + SimPO
post-training, and custom modeling code (modeling_marlin.py) exposing
the .caption() and .find() inference modes.
Marlin 2B and the modifications listed above are licensed under the
Business Source License 1.1 (see LICENSE).
|