Video-Text-to-Text
Transformers
Safetensors
English
moss_vl
feature-extraction
Base
Video-Understanding
Image-Understanding
MOSS-VL
OpenMOSS
multimodal
video
vision-language
custom_code
Instructions to use OpenMOSS-Team/MOSS-VL-Base-0408 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenMOSS-Team/MOSS-VL-Base-0408 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenMOSS-Team/MOSS-VL-Base-0408", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update code for transformers 5.5.4
Browse files
video_preprocessor_config.json
CHANGED
|
@@ -7,7 +7,7 @@
|
|
| 7 |
"longest_edge": 16777216,
|
| 8 |
"shortest_edge": 4096
|
| 9 |
},
|
| 10 |
-
"video_max_pixels":
|
| 11 |
"patch_size": 16,
|
| 12 |
"temporal_patch_size": 1,
|
| 13 |
"merge_size": 2,
|
|
|
|
| 7 |
"longest_edge": 16777216,
|
| 8 |
"shortest_edge": 4096
|
| 9 |
},
|
| 10 |
+
"video_max_pixels": 65536000,
|
| 11 |
"patch_size": 16,
|
| 12 |
"temporal_patch_size": 1,
|
| 13 |
"merge_size": 2,
|