| license: mit | |
| datasets: | |
| - liuhaotian/LLaVA-Instruct-150K | |
| - LanguageBind/Video-LLaVA | |
| language: | |
| - en | |
| metrics: | |
| - accuracy | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| # LSTP-Chat: Language-guided Spatial-Temporal Prompt Learning for Video Chat | |
| Available Models: | |
| - LSTP-FlanT5xl | |
| - LSTP-Chat-7B (Vicuna-7b) | |
| For more details, please refer to our [official repository](https://github.com/bigai-nlco/LSTP-Chat) |