| license: mit | |
| pipeline_tag: video-text-to-text | |
| This repository contains the model described in [VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation](https://huggingface.co/papers/2412.00927). |
| license: mit | |
| pipeline_tag: video-text-to-text | |
| This repository contains the model described in [VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation](https://huggingface.co/papers/2412.00927). |