| | --- |
| | library_name: transformers |
| | license: mit |
| | pipeline_tag: image-to-text |
| | --- |
| | |
| | # Blip Image Captioning Base BF16 |
| |
|
| | This model is a quantized version of the [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base), an image-to-text model. |
| | From a memory footprint of 989 MBs -> 494 MBs by quantizing the percision of float32 to bfloat 16, reducing the model's memory size by 50 percent. |
| |
|
| | ## Example |
| |
|
| | | <img src="https://huggingface.co/gospacedev/blip-image-captioning-base-bf16/resolve/main/cat%20in%20currents.png" width="316" height="316"> | |
| | |---| |
| | | a cat sitting on top of a purple and red striped carpet | |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Use the code below to get started with the model. |
| |
|
| | ```python |
| | from transformers import BlipForConditionalGeneration, BlipProcessor |
| | import requests |
| | from PIL import Image |
| | |
| | model = BlipForConditionalGeneration.from_pretrained("gospacedev/blip-image-captioning-base-bf16") |
| | processor = BlipProcessor.from_pretrained("gospacedev/blip-image-captioning-base-bf16") |
| | |
| | # Load sample image |
| | image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') |
| | |
| | # Generate output |
| | inputs = processor(image, return_tensors="pt") |
| | output = model.generate(**inputs) |
| | result = processor.decode(out[0], skip_special_tokens=True) |
| | |
| | print(results) |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | - **Developed by:** Grantley Cullar |
| | - **Model type:** Image-to-Text |
| | - **Language(s) (NLP):** English |
| | - **License:** MIT License |