Video-Text-to-Text
Transformers
PyTorch
English
vision-encoder-decoder
image-text-to-text
video-captioning
Eval Results (legacy)
Instructions to use Neleac/SpaceTimeGPT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Neleac/SpaceTimeGPT with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("Neleac/SpaceTimeGPT") model = AutoModelForImageTextToText.from_pretrained("Neleac/SpaceTimeGPT") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,9 @@ inference: false
|
|
| 12 |
tags:
|
| 13 |
- video-captioning
|
| 14 |
---
|
| 15 |
-
#
|
|
|
|
|
|
|
| 16 |
|
| 17 |
Vision Encoder Model: [timesformer-base-finetuned-k600](https://huggingface.co/facebook/timesformer-base-finetuned-k600) \
|
| 18 |
Text Decoder Model: [gpt2](https://huggingface.co/gpt2)
|
|
|
|
| 12 |
tags:
|
| 13 |
- video-captioning
|
| 14 |
---
|
| 15 |
+
# SpaceTimeGPT - A Spatiotemporal Video Captioning Model
|
| 16 |
+
|
| 17 |
+
<img src="https://raw.githubusercontent.com/Neleac/SpaceTimeGPT/main/model.JPG" width="35%">
|
| 18 |
|
| 19 |
Vision Encoder Model: [timesformer-base-finetuned-k600](https://huggingface.co/facebook/timesformer-base-finetuned-k600) \
|
| 20 |
Text Decoder Model: [gpt2](https://huggingface.co/gpt2)
|