farewellthree
/

ppllava_weight

Model card Files Files and versions

Add model card for PPLLaVA

#1

by nielsr HF Staff - opened Mar 9

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +37 -3

README.md CHANGED Viewed

@@ -1,3 +1,37 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: video-text-to-text
+---
+# PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
+[PPLLaVA](https://huggingface.co/papers/2411.02327) (Prompt-guided Pooling LLaVA) is an efficient Video Large Language Model designed to process long and varied video sequences by adaptively compressing visual tokens based on user instructions.
+- **Paper:** [PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance](https://huggingface.co/papers/2411.02327)
+- **Repository:** [GitHub - farewellthree/ppllava](https://github.com/farewellthree/ppllava)
+## Introduction
+PPLLaVA addresses the computational overhead of Video LLMs caused by high redundancy in video content. It introduces three key components:
+1.  **CLIP-based visual-prompt alignment module**: Identifies regions of interest based on user instructions.
+2.  **Prompt-guided pooling mechanism**: Adaptively compresses the visual sequence using convolution-style pooling, achieving up to 18x token reduction.
+3.  **Clip context extension module**: Tailored for processing long and complex prompts in visual dialogues.
+The model achieves state-of-the-art results on diverse video understanding benchmarks, including VideoMME, MVBench, and ActivityNetQA, while significantly improving inference throughput (up to 8x faster).
+## Usage
+Please refer to the [official GitHub repository](https://github.com/farewellthree/ppllava) for detailed instructions on installation, environment setup, and running the Gradio demo.
+## Citation
+If you find the code and paper useful for your research, please consider citing:
+```bibtex
+@article{liu2024ppllava,
+  title={PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance},
+  author={Liu, Ruyang and Tang, Haoran and Liu, Haibo and Ge, Yixiao and Shan, Ying and Li, Chen and Yang, Jiankun},
+  journal={arXiv preprint arXiv:2411.02327},
+  year={2024}
+}
+```