Video-Text-to-Text
Transformers
Safetensors
English
internvl_chat
feature-extraction
multimodal
custom_code
Eval Results (legacy)
Instructions to use OpenGVLab/InternVideo2_5_Chat_8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenGVLab/InternVideo2_5_Chat_8B with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenGVLab/InternVideo2_5_Chat_8B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update demo.py
Browse files
demo.py
CHANGED
|
@@ -10,8 +10,10 @@ from transformers import AutoModel, AutoTokenizer
|
|
| 10 |
model_path = 'OpenGVLab/InternVideo2_5_Chat_8B'
|
| 11 |
|
| 12 |
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 13 |
-
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()
|
| 14 |
|
|
|
|
|
|
|
| 15 |
|
| 16 |
def build_transform(input_size):
|
| 17 |
MEAN, STD = IMAGENET_MEAN, IMAGENET_STD
|
|
|
|
| 10 |
model_path = 'OpenGVLab/InternVideo2_5_Chat_8B'
|
| 11 |
|
| 12 |
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 13 |
+
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda().to(torch.bfloat16)
|
| 14 |
|
| 15 |
+
IMAGENET_MEAN = (0.485, 0.456, 0.406)
|
| 16 |
+
IMAGENET_STD = (0.229, 0.224, 0.225)
|
| 17 |
|
| 18 |
def build_transform(input_size):
|
| 19 |
MEAN, STD = IMAGENET_MEAN, IMAGENET_STD
|