Instructions to use Salesforce/blip2-opt-2.7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Salesforce/blip2-opt-2.7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Salesforce/blip2-opt-2.7b")# Load model directly from transformers import AutoProcessor, AutoModelForVisualQuestionAnswering processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b") model = AutoModelForVisualQuestionAnswering.from_pretrained("Salesforce/blip2-opt-2.7b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Salesforce/blip2-opt-2.7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Salesforce/blip2-opt-2.7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/blip2-opt-2.7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Salesforce/blip2-opt-2.7b
- SGLang
How to use Salesforce/blip2-opt-2.7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Salesforce/blip2-opt-2.7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/blip2-opt-2.7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Salesforce/blip2-opt-2.7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/blip2-opt-2.7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Salesforce/blip2-opt-2.7b with Docker Model Runner:
docker model run hf.co/Salesforce/blip2-opt-2.7b
How to pass CLIP image embeddings to BLIP2 for captioning?
Hi, I want to pass CLIP image embeddings (1x768 or 257x768) to BLIP-2 to generate captions and I’m wondering if this can be done through diffusers or other means.
Any help would be greatly appreciated.
Hi,
Note that the BLIP-2 models (like the one of this repository) assume that the CLIP model being used is a very specific one, namely an EVA-CLIP one with 39 layers in its vision encoder as seen here. If you will pass embeddings from a different CLIP model, then the output will be random. You could pass them by replacing this line by your custom embeddings (so this would require forking the library and passing them there). Alternatively, we could also add an image_embeds argument to the forward method of Blip2ForConditionalGeneration such that you can easily pass them. Could you open an issue on the Transformers library for that?
Hi, I want to pass CLIP image embeddings (1x768 or 257x768) to BLIP-2 to generate captions and I’m wondering if this can be done through diffusers or other means.
Any help would be greatly appreciated.
Hello!
Have you solved this problem? I also want to embed CLIP images and pass them on to BLIP2 for image captioning tasks. I hope to get some solutions from you and look forward to your reply.
Good luck to you!
Hi @shams123321 ,
I opted to use the lavis library instead of transformers and essentially replaced this line in the generate method of the Blip2OPTclass with the embeddings that I passed to the method. I also used the pre-trained Blip2's associated preprocessor and image encoder to get my image embeddings.
I hope this helps.