Video-Text-to-Text
Transformers
Safetensors
English
moss_vl
feature-extraction
SFT
Video-Understanding
Image-Understanding
MOSS-VL
OpenMOSS
multimodal
video
vision-language
custom_code
Instructions to use OpenMOSS-Team/MOSS-VL-Instruct-0408 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenMOSS-Team/MOSS-VL-Instruct-0408 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenMOSS-Team/MOSS-VL-Instruct-0408", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update processing_moss_vl.py
Browse files- processing_moss_vl.py +1 -1
processing_moss_vl.py
CHANGED
|
@@ -116,7 +116,7 @@ class MossVLImageProcessor(Qwen2VLImageProcessor):
|
|
| 116 |
stacked_images = self.resize(
|
| 117 |
image=stacked_images,
|
| 118 |
size=SizeDict(height=resized_height, width=resized_width),
|
| 119 |
-
|
| 120 |
)
|
| 121 |
resized_images_grouped[shape] = stacked_images
|
| 122 |
resized_images = reorder_images(resized_images_grouped, grouped_images_index)
|
|
|
|
| 116 |
stacked_images = self.resize(
|
| 117 |
image=stacked_images,
|
| 118 |
size=SizeDict(height=resized_height, width=resized_width),
|
| 119 |
+
resample=resample,
|
| 120 |
)
|
| 121 |
resized_images_grouped[shape] = stacked_images
|
| 122 |
resized_images = reorder_images(resized_images_grouped, grouped_images_index)
|