Video-Text-to-Text
Transformers
Safetensors
English
videollama3_qwen2
text-generation
multi-modal
large-language-model
video-language-model
custom_code
Instructions to use DAMO-NLP-SG/VideoLLaMA3-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DAMO-NLP-SG/VideoLLaMA3-2B with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("DAMO-NLP-SG/VideoLLaMA3-2B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Fix VideoInput import error
#3
by rhassana - opened
image_processing_videollama3.py
CHANGED
|
@@ -39,7 +39,6 @@ from transformers.image_utils import (
|
|
| 39 |
ChannelDimension,
|
| 40 |
ImageInput,
|
| 41 |
PILImageResampling,
|
| 42 |
-
VideoInput,
|
| 43 |
get_image_size,
|
| 44 |
infer_channel_dimension_format,
|
| 45 |
is_scaled_image,
|
|
@@ -47,6 +46,7 @@ from transformers.image_utils import (
|
|
| 47 |
make_list_of_images,
|
| 48 |
to_numpy_array,
|
| 49 |
)
|
|
|
|
| 50 |
from transformers.utils import TensorType, is_vision_available, logging
|
| 51 |
|
| 52 |
|
|
|
|
| 39 |
ChannelDimension,
|
| 40 |
ImageInput,
|
| 41 |
PILImageResampling,
|
|
|
|
| 42 |
get_image_size,
|
| 43 |
infer_channel_dimension_format,
|
| 44 |
is_scaled_image,
|
|
|
|
| 46 |
make_list_of_images,
|
| 47 |
to_numpy_array,
|
| 48 |
)
|
| 49 |
+
from transformers.video_utils import VideoInput
|
| 50 |
from transformers.utils import TensorType, is_vision_available, logging
|
| 51 |
|
| 52 |
|