Instructions to use Salesforce/blip2-opt-2.7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Salesforce/blip2-opt-2.7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Salesforce/blip2-opt-2.7b")

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = AutoModelForMultimodalLM.from_pretrained("Salesforce/blip2-opt-2.7b")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Salesforce/blip2-opt-2.7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Salesforce/blip2-opt-2.7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Salesforce/blip2-opt-2.7b

SGLang

How to use Salesforce/blip2-opt-2.7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Salesforce/blip2-opt-2.7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Salesforce/blip2-opt-2.7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Salesforce/blip2-opt-2.7b with Docker Model Runner:
```
docker model run hf.co/Salesforce/blip2-opt-2.7b
```

How to use BLIP 2.0

by matheusdias - opened Feb 24, 2023

Discussion

matheusdias

Feb 24, 2023

Hi! I am unable to use BLIP 2.0 through the transformers library - I think the library needs to be updates still to have the model. Is there anyway to get around this?

nielsr

Feb 24, 2023

Yes as the model is brand new you need to install Transformers from source for the moment:

pip install git+https://github.com/huggingface/transformers.git

timcedric

Feb 26, 2023

I need it through the library unfortunately, how long does it usually take to get it into the library as well?

nielsr

Feb 27, 2023

A new Transformers version will be released in March (typically a release is done every month).

matheusdias

Feb 28, 2023

•

edited Feb 28, 2023

@nielsr I am having a hard time downloading the model to colab because its so large and consumes all my RAM. Is this expected (and I should a cluster with more RAM) or is there some way around that? I had no problems with the other caption generation models.

Neumann21

Apr 11, 2023

I met an error when I run the model on GPU In full precision:
(base) wangson: CUDA_VISIBLE_DEVICES=0 python test.blip.py
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.25s/it]
/home/anaconda/lib/python3.10/site-packages/transformers/generation/utils.py:1288: UserWarning: Using max_length's default (20) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using max_new_tokens to control the maximum length of the generation.

How did this happen? Is this due to some packages' version diff?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment