Instructions to use Salesforce/blip2-opt-2.7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Salesforce/blip2-opt-2.7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Salesforce/blip2-opt-2.7b")

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = AutoModelForMultimodalLM.from_pretrained("Salesforce/blip2-opt-2.7b")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Salesforce/blip2-opt-2.7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Salesforce/blip2-opt-2.7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Salesforce/blip2-opt-2.7b

SGLang

How to use Salesforce/blip2-opt-2.7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Salesforce/blip2-opt-2.7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Salesforce/blip2-opt-2.7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/blip2-opt-2.7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Salesforce/blip2-opt-2.7b with Docker Model Runner:
```
docker model run hf.co/Salesforce/blip2-opt-2.7b
```

Confidence scores for image captioning?

#13

by acmidev - opened Aug 17, 2023

Discussion

acmidev

Aug 17, 2023

Hi there,

I was wondering how to generate confidence scores when generating image captions with the sample code.

Best, Simon.

from PIL import Image
import requests
from transformers import Blip2Processor, Blip2ForConditionalGeneration
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16
)
model.to(device)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(images=image, return_tensors="pt").to(device, torch.float16)

generated_ids = model.generate(**inputs)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)
two cats laying on a couch

nielsr

Aug 17, 2023

Hi,

You can obtain a confidence score by passing output_scores=True and return_dict_in_generate=Trueto the generate() method.

outputs = model.generate(**inputs, output_scores=True, return_dict=_in_generate=True)
scores = outputs.scores

According to the docs:

In case of greedy decoding; this contains the processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of torch.FloatTensor with up to max_new_tokens elements (one element for each generated token), with each tensor of shape (batch_size, config.vocab_size).

To calculate a probability for the entire sequence, you could do the following:

# get probability for each generated token
topks = [s.softmax(-1).topk(1) for s in output.scores] 

probs = []
for tk in topks:
    probs.append(tk.values.view(-1)[0].item())

# multiply probabilities
sequence_prob = torch.tensor(probs).prod()

acmidev

Aug 23, 2023

@nielsr Thanks so much for the code - so adding it back to the sample code via this Google Colab the output is:

Prediction: two cats laying on a couch
Confidence: 0.012353635393083096

So does that suggest the confidence level is 1.2%?

shams123321

Mar 15, 2024

@nielsr Thanks so much for the code - so adding it back to the sample code via this Google Colab the output is:
Prediction: two cats laying on a couch
Confidence: 0.012353635393083096
So does that suggest the confidence level is 1.2%?

Have you solved this problem? I obtained the same result using the code provided above. If you have solved this problem, I hope you can share your correct code with me. Thank you very much! Good luck to you!

acmidev

Mar 18, 2024

@shams123321 I haven't heard back from @nielsr about it yet, and we haven't resolved it unfortunately.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment