Instructions to use Chat-UniVi/Chat-UniVi-13B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Chat-UniVi/Chat-UniVi-13B with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Chat-UniVi/Chat-UniVi-13B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Different from Video-LLaVA
#1
by Yhyu13 - opened
Hi,
Seem there is another work that also made by the PKU called Video-LLaVA https://huggingface.co/LanguageBind/Video-LLaVA-V1.5/tree/main
Although weight not release yet, but it seems surpass chat univi in all benchmarks
https://github.com/PKU-YuanGroup/Video-LLaVA#video-understanding
Some both methods added video unserstanding to LLM, and both methods can process video & image simulnatenously. It is just that ChatUniVi has invented another apoproach that is different from LLaVA, where Video-LLaVA has derived from? Is there anything that make these two methods fundamentally different?
Thanks!