BLIP2

Running on Zero

App Files Files Community

Question about BLIP2 on spaces

by iamrobotbear - opened Mar 31, 2023

Discussion

iamrobotbear

Mar 31, 2023

Hey @hysts

Sorry, this is only slightly related to your space.

I'm trying to build a Gradio app for VQA using Blip2 and I'm getting errors trying to build this on an A10G. Your space also loads Salesforce/blip2-opt-2.7b via Transformers on an a10g.

Any guidance you have would be much appreciated! Thanks!

I'm getting a runtime error code 137, which seems to be an OOM error. Were you able to make this work?

https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/main/app.py

Thanks!

iamrobotbear

Mar 31, 2023

Trying an A10G-Large now

hysts

Owner Mar 31, 2023

Hi @iamrobotbear
IIRC, the error code 137 means CPU OOM, so I guess using A10G-large will fix your problem as you mentioned.

hysts

Owner Mar 31, 2023

Also, doing this might help. https://huggingface.co/spaces/hysts/BLIP2-with-transformers/blob/ef1cf40224d1b3f49a542756bc4bfa9d00ab1ac7/app.py#L31-L33

iamrobotbear

Mar 31, 2023

Woah, didn't expect a response so quickly (or at all), so thank you so much! - you might have looked when I changed my file as I'm trying to use Salesforce/blip2-opt-6.7b

I'm returning my file to the state that I hope to make work, I'm just stuck and a bit over my head rn. Appreciate any help ya can give. Thanks!

@hysts

iamrobotbear

Mar 31, 2023

•

edited Mar 31, 2023

(I may also have a gradio error, trying to figure that out as well). I should know momentarily, the build is just about ready to fail.

hysts

Owner Mar 31, 2023

@iamrobotbear
I think you need to remove these in the case you are using load_in_8bit=True.
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L15
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L19

Also, you are using the same component multiple times here, but I think you can't reuse components that way.
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L57-L58

hysts

Owner Mar 31, 2023

Also, this is a minor point, but gr.inputs. and gr.outputs are deprecated, and you can just use gr.Image etc.

https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L50-L52

iamrobotbear

Mar 31, 2023

@hysts -- Ok, I think I'm still stuck and, frankly do not know where to go next. As I said I'm a bit over my head.

The origin of my app.py file (seen here: https://gist.github.com/brianjking/e67bb7473d29e968aa23a6f791484298) is based on this Jupyter notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BLIP-2/Chat_with_BLIP_2_%5Bint8_bitsandbytes%5D.ipynb.

Here's my current error when I try to build the above linked Gist on an A10G-Large

Space failed to start. Exit code: 1. Reason: ython3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//iamrobotbear-blip-vqa-gradio.hf.space'), PosixPath('https')} warn(msg) /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/lib/pyenv/hooks'), PosixPath('/etc/pyenv.d'), PosixPath('/usr/local/etc/pyenv.d')} warn(msg) /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('tcp'), PosixPath('443'), PosixPath('//172.20.0.1')} warn(msg) Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:05<00:05, 5.32s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:07<00:00, 3.73s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:07<00:00, 3.97s/it] /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/outputs.py:22: UserWarning: Usage of gradio.outputs is deprecated, and will not be supported in the future, please import your components from gradio.components warnings.warn( Traceback (most recent call last): File "app.py", line 47, in iface = gr.Interface( File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/interface.py", line 444, in init ) = self.render_input_column() File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/interface.py", line 511, in render_input_column component.render() File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 85, in render raise DuplicateBlockError( gradio.exceptions.DuplicateBlockError: A block with id: 1 has already been rendered in the current Blocks.

Essentially, I want to be able to do the following:

• User can load an image or ideally image(s) from a directory or a series of images
• User can use Image captioning, prompted image captioning, VQA, or chat based prompting.
• Ideally, I'll be able to take that and generate Image Text Matching scores using Blip2 as I do here: https://huggingface.co/spaces/iamrobotbear/test.

The end result should be able to be some sort of way to test images against a series of statements to see if they have a match with a percentage confidence score of the match.

Thanks for your help, I really appreciate it. I think I'm soooo close.

hysts

Owner Apr 2, 2023

@iamrobotbear
I think you can fix the current error by applying the following patch to your latest code.

diff --git a/app.py b/app.py
index d34ec0a..92edbcb 100644
--- a/app.py
+++ b/app.py
@@ -9,12 +9,13 @@ processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
 model = Blip2ForConditionalGeneration.from_pretrained(
     "Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map='auto'
 )
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

 def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
     # Prepare image input
     image_input = Image.fromarray(image).convert('RGB')
     inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
-
+
     # Image Captioning
     generated_ids = model.generate(**inputs, max_new_tokens=20)
     image_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
@@ -23,13 +24,13 @@ def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
     inputs = processor(image_input, text=prompted_caption_text, return_tensors="pt").to(device, torch.float16)
     generated_ids = model.generate(**inputs, max_new_tokens=20)
     prompted_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
-
+
     # Visual Question Answering (VQA)
     prompt = f"Question: {vqa_question} Answer:"
     inputs = processor(image_input, text=prompt, return_tensors="pt").to(device, torch.float16)
     generated_ids = model.generate(**inputs, max_new_tokens=10)
     vqa_answer = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
-
+
     # Chat-based Prompting
     prompt = chat_context + " Answer:"
     inputs = processor(image_input, text=prompt, return_tensors="pt").to(device, torch.float16)
@@ -40,14 +41,19 @@ def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):

 # Define Gradio input and output components
 image_input = gr.Image(type="numpy")
-text_input = gr.Text()
-output_text = gr.outputs.Textbox()
+prompted_caption_input = gr.Textbox()
+vqa_question_input = gr.Textbox()
+chat_context = gr.Textbox()
+image_caption_result = gr.Textbox()
+prompted_caption_result = gr.Textbox()
+vqa_answer = gr.Textbox()
+chat_response = gr.Textbox()

 # Create Gradio interface
 iface = gr.Interface(
     fn=blip2_interface,
-    inputs=[image_input, text_input, text_input, text_input],
-    outputs=[output_text, output_text, output_text, output_text],
+    inputs=[image_input, prompted_caption_input, vqa_question_input, chat_context],
+    outputs=[image_caption_result, prompted_caption_result, vqa_answer, chat_response],
     title="BLIP-2 Image Captioning and VQA",
     description="Interact with the BLIP-2 model for image captioning, prompted image captioning, visual question answering, and chat-based prompting.",
 )

iamrobotbear

Apr 2, 2023

Ooh, thank you @hysts This ALMOST works, for the first time, it actually builds!

I'm getting 3 input boxes and then 4 output for some reason; when I submit questions/prompts I get errors in the Gradio output boxes, but nothing in the logs. I imagine it's likely due to somehow having only 3 inputs and 4 outputs...?

In what you're currently running on your space (in this repo):

• Is this A10G small or Large?
• Is it currently using BLIP2 / Salesforce/blip2-opt-2.7b or FLANt5?

Thank you so much, really appreciate you!

iamrobotbear

Apr 2, 2023

Looks like I might have another error too, possibly related?

9gsj8 2023-04-02T14:09:12.901Z Traceback (most recent call last):
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict
9gsj8 2023-04-02T14:09:12.902Z output = await app.get_blocks().process_api(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api
9gsj8 2023-04-02T14:09:12.902Z result = await self.call_function(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function
9gsj8 2023-04-02T14:09:12.902Z prediction = await anyio.to_thread.run_sync(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
9gsj8 2023-04-02T14:09:12.902Z return await get_asynclib().run_sync_in_worker_thread(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
9gsj8 2023-04-02T14:09:12.902Z return await future
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
9gsj8 2023-04-02T14:09:12.902Z result = context.run(func, *args)
9gsj8 2023-04-02T14:09:12.902Z File "app.py", line 16, in blip2_interface
9gsj8 2023-04-02T14:09:12.902Z inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
9gsj8 2023-04-02T14:09:12.902Z NameError: name 'device' is not defined
9gsj8 2023-04-02T14:14:39.869Z Traceback (most recent call last):
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict
9gsj8 2023-04-02T14:14:39.869Z output = await app.get_blocks().process_api(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api
9gsj8 2023-04-02T14:14:39.869Z result = await self.call_function(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function
9gsj8 2023-04-02T14:14:39.869Z prediction = await anyio.to_thread.run_sync(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
9gsj8 2023-04-02T14:14:39.869Z return await get_asynclib().run_sync_in_worker_thread(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
9gsj8 2023-04-02T14:14:39.869Z return await future
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
9gsj8 2023-04-02T14:14:39.869Z result = context.run(func, *args)
9gsj8 2023-04-02T14:14:39.869Z File "app.py", line 16, in blip2_interface
9gsj8 2023-04-02T14:14:39.869Z inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
9gsj8 2023-04-02T14:14:39.869Z NameError: name 'device' is not defined

hysts

Owner Apr 2, 2023

@iamrobotbear
It seems you forgot to add this line in my comment:

@@ -9,12 +9,13 @@ processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
 model = Blip2ForConditionalGeneration.from_pretrained(
     "Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map='auto'
 )
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Your function returned 4 values and shows 4 outputs, so I just suggested a change to make your code work. If it's not what you want to do, maybe you can change that part.

hysts

Owner Apr 2, 2023

@iamrobotbear
Sorry, forgot to answer these:

• Is this A10G small or Large?
• Is it currently using BLIP2 / Salesforce/blip2-opt-2.7b or FLANt5?

I've tested your code on my GCP environment that has T4 and 30GB CPU RAMs, but it doesn't seem to use much CPU RAM, I guess you can run your code on T4 small. The model I tested was Salesforce/blip2-opt-2.7b.

iamrobotbear

Apr 2, 2023

Regarding the second question, I'm asking about YOUR space, in this repo.

Is it using FlanT5 or blip2-opt-2.7b?

hysts

Owner Apr 2, 2023

@iamrobotbear Ah, sorry, I misunderstood. This Space is using FLAN T5 XXL and running on A10G small.

iamrobotbear

Apr 2, 2023

Hmm, now to figure out how/why I have no labels on the output boxes? When I do not have anything in the input boxes and click "SUbmit" I'll get 4 answers, despite nothing in the input boxes.

hysts changed discussion status to closed Jul 18, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment