Spaces:
Running
on
Zero
Question about BLIP2 on spaces
Hey @hysts
Sorry, this is only slightly related to your space.
I'm trying to build a Gradio app for VQA using Blip2 and I'm getting errors trying to build this on an A10G. Your space also loads Salesforce/blip2-opt-2.7b via Transformers on an a10g.
Any guidance you have would be much appreciated! Thanks!
I'm getting a runtime error code 137, which seems to be an OOM error. Were you able to make this work?
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/main/app.py
Thanks!
Trying an A10G-Large now
Hi
@iamrobotbear
IIRC, the error code 137 means CPU OOM, so I guess using A10G-large will fix your problem as you mentioned.
Woah, didn't expect a response so quickly (or at all), so thank you so much! - you might have looked when I changed my file as I'm trying to use Salesforce/blip2-opt-6.7b
I'm returning my file to the state that I hope to make work, I'm just stuck and a bit over my head rn. Appreciate any help ya can give. Thanks!
(I may also have a gradio error, trying to figure that out as well). I should know momentarily, the build is just about ready to fail.
@iamrobotbear
I think you need to remove these in the case you are using load_in_8bit=True.
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L15
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L19
Also, you are using the same component multiple times here, but I think you can't reuse components that way.
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L57-L58
Also, this is a minor point, but gr.inputs. and gr.outputs are deprecated, and you can just use gr.Image etc.
@hysts -- Ok, I think I'm still stuck and, frankly do not know where to go next. As I said I'm a bit over my head.
The origin of my app.py file (seen here: https://gist.github.com/brianjking/e67bb7473d29e968aa23a6f791484298) is based on this Jupyter notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BLIP-2/Chat_with_BLIP_2_%5Bint8_bitsandbytes%5D.ipynb.
Here's my current error when I try to build the above linked Gist on an A10G-Large
Space failed to start. Exit code: 1. Reason: ython3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//iamrobotbear-blip-vqa-gradio.hf.space'), PosixPath('https')} warn(msg) /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/lib/pyenv/hooks'), PosixPath('/etc/pyenv.d'), PosixPath('/usr/local/etc/pyenv.d')} warn(msg) /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('tcp'), PosixPath('443'), PosixPath('//172.20.0.1')} warn(msg) Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:05<00:05, 5.32s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:07<00:00, 3.73s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:07<00:00, 3.97s/it] /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/outputs.py:22: UserWarning: Usage of gradio.outputs is deprecated, and will not be supported in the future, please import your components from gradio.components warnings.warn( Traceback (most recent call last): File "app.py", line 47, in iface = gr.Interface( File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/interface.py", line 444, in init ) = self.render_input_column() File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/interface.py", line 511, in render_input_column component.render() File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 85, in render raise DuplicateBlockError( gradio.exceptions.DuplicateBlockError: A block with id: 1 has already been rendered in the current Blocks.
Essentially, I want to be able to do the following:
• User can load an image or ideally image(s) from a directory or a series of images
• User can use Image captioning, prompted image captioning, VQA, or chat based prompting.
• Ideally, I'll be able to take that and generate Image Text Matching scores using Blip2 as I do here: https://huggingface.co/spaces/iamrobotbear/test.
The end result should be able to be some sort of way to test images against a series of statements to see if they have a match with a percentage confidence score of the match.
Thanks for your help, I really appreciate it. I think I'm soooo close.
@iamrobotbear
I think you can fix the current error by applying the following patch to your latest code.
diff --git a/app.py b/app.py
index d34ec0a..92edbcb 100644
--- a/app.py
+++ b/app.py
@@ -9,12 +9,13 @@ processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
"Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map='auto'
)
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
# Prepare image input
image_input = Image.fromarray(image).convert('RGB')
inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
-
+
# Image Captioning
generated_ids = model.generate(**inputs, max_new_tokens=20)
image_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
@@ -23,13 +24,13 @@ def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
inputs = processor(image_input, text=prompted_caption_text, return_tensors="pt").to(device, torch.float16)
generated_ids = model.generate(**inputs, max_new_tokens=20)
prompted_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
-
+
# Visual Question Answering (VQA)
prompt = f"Question: {vqa_question} Answer:"
inputs = processor(image_input, text=prompt, return_tensors="pt").to(device, torch.float16)
generated_ids = model.generate(**inputs, max_new_tokens=10)
vqa_answer = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
-
+
# Chat-based Prompting
prompt = chat_context + " Answer:"
inputs = processor(image_input, text=prompt, return_tensors="pt").to(device, torch.float16)
@@ -40,14 +41,19 @@ def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
# Define Gradio input and output components
image_input = gr.Image(type="numpy")
-text_input = gr.Text()
-output_text = gr.outputs.Textbox()
+prompted_caption_input = gr.Textbox()
+vqa_question_input = gr.Textbox()
+chat_context = gr.Textbox()
+image_caption_result = gr.Textbox()
+prompted_caption_result = gr.Textbox()
+vqa_answer = gr.Textbox()
+chat_response = gr.Textbox()
# Create Gradio interface
iface = gr.Interface(
fn=blip2_interface,
- inputs=[image_input, text_input, text_input, text_input],
- outputs=[output_text, output_text, output_text, output_text],
+ inputs=[image_input, prompted_caption_input, vqa_question_input, chat_context],
+ outputs=[image_caption_result, prompted_caption_result, vqa_answer, chat_response],
title="BLIP-2 Image Captioning and VQA",
description="Interact with the BLIP-2 model for image captioning, prompted image captioning, visual question answering, and chat-based prompting.",
)
Ooh, thank you @hysts This ALMOST works, for the first time, it actually builds!
I'm getting 3 input boxes and then 4 output for some reason; when I submit questions/prompts I get errors in the Gradio output boxes, but nothing in the logs. I imagine it's likely due to somehow having only 3 inputs and 4 outputs...?
In what you're currently running on your space (in this repo):
• Is this A10G small or Large?
• Is it currently using BLIP2 / Salesforce/blip2-opt-2.7b or FLANt5?
Thank you so much, really appreciate you!
Looks like I might have another error too, possibly related?
9gsj8 2023-04-02T14:09:12.901Z Traceback (most recent call last):
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict
9gsj8 2023-04-02T14:09:12.902Z output = await app.get_blocks().process_api(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api
9gsj8 2023-04-02T14:09:12.902Z result = await self.call_function(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function
9gsj8 2023-04-02T14:09:12.902Z prediction = await anyio.to_thread.run_sync(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
9gsj8 2023-04-02T14:09:12.902Z return await get_asynclib().run_sync_in_worker_thread(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
9gsj8 2023-04-02T14:09:12.902Z return await future
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
9gsj8 2023-04-02T14:09:12.902Z result = context.run(func, *args)
9gsj8 2023-04-02T14:09:12.902Z File "app.py", line 16, in blip2_interface
9gsj8 2023-04-02T14:09:12.902Z inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
9gsj8 2023-04-02T14:09:12.902Z NameError: name 'device' is not defined
9gsj8 2023-04-02T14:14:39.869Z Traceback (most recent call last):
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict
9gsj8 2023-04-02T14:14:39.869Z output = await app.get_blocks().process_api(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api
9gsj8 2023-04-02T14:14:39.869Z result = await self.call_function(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function
9gsj8 2023-04-02T14:14:39.869Z prediction = await anyio.to_thread.run_sync(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
9gsj8 2023-04-02T14:14:39.869Z return await get_asynclib().run_sync_in_worker_thread(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
9gsj8 2023-04-02T14:14:39.869Z return await future
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
9gsj8 2023-04-02T14:14:39.869Z result = context.run(func, *args)
9gsj8 2023-04-02T14:14:39.869Z File "app.py", line 16, in blip2_interface
9gsj8 2023-04-02T14:14:39.869Z inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
9gsj8 2023-04-02T14:14:39.869Z NameError: name 'device' is not defined
@iamrobotbear
It seems you forgot to add this line in my comment:
@@ -9,12 +9,13 @@ processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
"Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map='auto'
)
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Your function returned 4 values and shows 4 outputs, so I just suggested a change to make your code work. If it's not what you want to do, maybe you can change that part.
@iamrobotbear
Sorry, forgot to answer these:
• Is this A10G small or Large?
• Is it currently using BLIP2 / Salesforce/blip2-opt-2.7b or FLANt5?
I've tested your code on my GCP environment that has T4 and 30GB CPU RAMs, but it doesn't seem to use much CPU RAM, I guess you can run your code on T4 small. The model I tested was Salesforce/blip2-opt-2.7b.
Regarding the second question, I'm asking about YOUR space, in this repo.
Is it using FlanT5 or blip2-opt-2.7b?
@iamrobotbear Ah, sorry, I misunderstood. This Space is using FLAN T5 XXL and running on A10G small.

