Instructions to use omni-research/Tarsier-34b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use omni-research/Tarsier-34b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="omni-research/Tarsier-34b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("omni-research/Tarsier-34b") model = AutoModelForSeq2SeqLM.from_pretrained("omni-research/Tarsier-34b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use omni-research/Tarsier-34b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "omni-research/Tarsier-34b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "omni-research/Tarsier-34b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/omni-research/Tarsier-34b
- SGLang
How to use omni-research/Tarsier-34b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "omni-research/Tarsier-34b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "omni-research/Tarsier-34b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "omni-research/Tarsier-34b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "omni-research/Tarsier-34b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use omni-research/Tarsier-34b with Docker Model Runner:
docker model run hf.co/omni-research/Tarsier-34b
Tarsier Model Card
Model details
Model type: Tarsier-34b is an open-source large-scale video-language models, which is designed to generate high-quality video descriptions, together with good capability of general video understanding (SOTA results on 6 open benchmarks).
Model date: Tarsier-34b was trained in June 2024.
Paper or resources for more information:
- github repo: https://github.com/bytedance/tarsier
- paper link: https://arxiv.org/abs/2407.00634
License
NousResearch/Nous-Hermes-2-Yi-34B license.
Where to send questions or comments about the model: https://github.com/bytedance/tarsier/issues
Intended use
Primary intended uses: The primary use of Tarsier is research on large multimodal models, especially video description.
Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
Training dataset
Tarsier tasks a two-stage training strategy.
- Stage-1: Multi-task Pre-training on 13M data
- Stage-2: Multi-grained Instruction Tuning on 500K data
In both stages, we freeze ViT and train all the parameters of projection layer and LLM.
Evaluation dataset
- A challenging video desription dataset: DREAM-1K
- Multi-choice VQA: MVBench, NeXT-QA and Egoschema
- Open-ended VQA: MSVD-QA, MSR-VTT-QA, ActivityNet-QA and TGIF-QA
- Video Caption: MSVD-Caption, MSRVTT-Caption, VATEX
How to Use
see https://github.com/bytedance/tarsier?tab=readme-ov-file#usage
- Downloads last month
- 5