Video-Text-to-Text
Transformers
Safetensors
English
videollama3_qwen2
text-generation
multi-modal
large-language-model
video-language-model
custom_code
Instructions to use cbipok/VideoLLaMA3-2B-fork with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cbipok/VideoLLaMA3-2B-fork with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("cbipok/VideoLLaMA3-2B-fork", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update modeling_videollama3_encoder.py
Browse files
modeling_videollama3_encoder.py
CHANGED
|
@@ -343,7 +343,7 @@ class VisionSdpaAttention(VisionAttention):
|
|
| 343 |
attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, attention_mask, dropout_p=0.0)
|
| 344 |
attn_output = attn_output.transpose(0, 1)
|
| 345 |
attn_output = attn_output.reshape(seq_length, -1)
|
| 346 |
-
attn_output = self.
|
| 347 |
return attn_output
|
| 348 |
|
| 349 |
|
|
|
|
| 343 |
attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, attention_mask, dropout_p=0.0)
|
| 344 |
attn_output = attn_output.transpose(0, 1)
|
| 345 |
attn_output = attn_output.reshape(seq_length, -1)
|
| 346 |
+
attn_output = self.out_proj(attn_output)
|
| 347 |
return attn_output
|
| 348 |
|
| 349 |
|