Instructions to use Salesforce/blip2-opt-2.7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Salesforce/blip2-opt-2.7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Salesforce/blip2-opt-2.7b")

# Load model directly
from transformers import AutoProcessor, AutoModelForVisualQuestionAnswering

processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = AutoModelForVisualQuestionAnswering.from_pretrained("Salesforce/blip2-opt-2.7b")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Salesforce/blip2-opt-2.7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Salesforce/blip2-opt-2.7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Salesforce/blip2-opt-2.7b

SGLang

How to use Salesforce/blip2-opt-2.7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Salesforce/blip2-opt-2.7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Salesforce/blip2-opt-2.7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Salesforce/blip2-opt-2.7b with Docker Model Runner:
```
docker model run hf.co/Salesforce/blip2-opt-2.7b
```

How to pass CLIP image embeddings to BLIP2 for captioning?

#19

by potsu-potsu - opened Nov 14, 2023

Discussion

potsu-potsu

Nov 14, 2023

Hi, I want to pass CLIP image embeddings (1x768 or 257x768) to BLIP-2 to generate captions and I’m wondering if this can be done through diffusers or other means.

Any help would be greatly appreciated.

nielsr

Nov 16, 2023

Hi,

Note that the BLIP-2 models (like the one of this repository) assume that the CLIP model being used is a very specific one, namely an EVA-CLIP one with 39 layers in its vision encoder as seen here. If you will pass embeddings from a different CLIP model, then the output will be random. You could pass them by replacing this line by your custom embeddings (so this would require forking the library and passing them there). Alternatively, we could also add an image_embeds argument to the forward method of Blip2ForConditionalGeneration such that you can easily pass them. Could you open an issue on the Transformers library for that?

shams123321

Mar 15, 2024

Hi, I want to pass CLIP image embeddings (1x768 or 257x768) to BLIP-2 to generate captions and I’m wondering if this can be done through diffusers or other means.

Any help would be greatly appreciated.

Hello!
Have you solved this problem? I also want to embed CLIP images and pass them on to BLIP2 for image captioning tasks. I hope to get some solutions from you and look forward to your reply.
Good luck to you!

potsu-potsu

Mar 15, 2024

•

edited Mar 15, 2024

Hi @shams123321 ,

I opted to use the lavis library instead of transformers and essentially replaced this line in the generate method of the Blip2OPTclass with the embeddings that I passed to the method. I also used the pre-trained Blip2's associated preprocessor and image encoder to get my image embeddings.

I hope this helps.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment