Video-Text-to-Text
Transformers
Safetensors
English
qwen3_5
text-generation
video
multimodal
video-captioning
temporal-grounding
qwen
VLM
custom_code
Instructions to use cudabenchmarktest/video-scan with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cudabenchmarktest/video-scan with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("cudabenchmarktest/video-scan", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("cudabenchmarktest/video-scan", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
| Marlin 2B | |
| Copyright (c) 2026 NemoStation | |
| This product includes weights derived from Qwen3.5-2B | |
| (https://huggingface.co/Qwen/Qwen3.5-2B), Copyright (c) 2025 Alibaba Cloud, | |
| used under the Apache License, Version 2.0 (see LICENSE-QWEN-BASE). | |
| Modifications by NemoStation include: integration of a video-capable | |
| visual tower, custom training data curation (~400K clip-level annotations | |
| with Gemini-3-Flash teacher distillation), two-stage SFT + SimPO | |
| post-training, and custom modeling code (modeling_marlin.py) exposing | |
| the .caption() and .find() inference modes. | |
| Marlin 2B and the modifications listed above are licensed under the | |
| Business Source License 1.1 (see LICENSE). | |