Chyd19 commited on
Commit
22dfbba
·
verified ·
1 Parent(s): b3e8e19

Create app.py

Browse files

Multimodal AI Image Studio Guide

Welcome to the Multimodal AI Image Studio! This tool provides an integrated platform for generating, comparing, and analyzing AI-generated images. Whether you're an artist, researcher, or just exploring AI, this interface gives you everything you need to work with AI-generated images and text.


How to Use
Step 1: Upload Reference Image

Simply drag and drop your image into the uploader.

The system will automatically generate a descriptive caption for your image.

Step 2: Generate Images from Caption

After generating a caption, you can use it to generate new images with:

SD-Turbo (for realistic images)

DreamShaper (for artistic, stylized creations)

Optionally, add a custom prompt enhancer (e.g., "with a futuristic city background") to refine the generated images.

Step 3: Compare Image Metrics

Once you've generated multiple images, you can compute metrics to compare their similarity:

CLIP: Measures how similar the images are to each other.

LPIPS: Evaluates perceptual differences between images.

BERTScore: Compares the captions of the images for textual similarity.

Step 4: NLP Analysis of Captions

You can analyze the captions of your images for:

Sentiment Analysis: Get a sense of the emotional tone of the caption (positive, negative, neutral).

Named Entity Recognition: Identify key entities mentioned in the captions (such as people, places, or organizations).

Topic Classification: Classify the caption into categories like "people," "nature," or "food."

Step 5: Ask Questions with VQA

If you have a reference image, you can ask it questions, such as "What color is the sky?" or "What is the animal in the image?"

The system will answer based on the content of the image.

Files changed (1) hide show
  1. app.py +574 -0
app.py ADDED
@@ -0,0 +1,574 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **Purpose**
2
+
3
+ # =====================================================
4
+ # Multimodal AI Image Studio
5
+ # =====================================================
6
+ # Purpose:
7
+ # This script provides a unified interface for generating,
8
+ # comparing, and analyzing AI-generated images.
9
+ #
10
+ # Key Features:
11
+ # 1. Upload a reference image and automatically generate captions.
12
+ # 2. Enhance prompts to generate images using:
13
+ # - SD-Turbo (Stability AI)
14
+ # - DreamShaper (Artistic style model)
15
+ # 3. Compute pairwise metrics between images:
16
+ # - CLIP similarity
17
+ # - LPIPS perceptual similarity
18
+ # - BERTScore textual similarity
19
+ # 4. NLP analysis of captions:
20
+ # - Sentiment analysis
21
+ # - Named entity recognition
22
+ # - Topic classification
23
+ # 5. Visual Question Answering (VQA) on the reference image.
24
+ #
25
+ # Requirements:
26
+ # - Python >= 3.9
27
+ # - GPU recommended for faster image generation
28
+ #
29
+ # Usage:
30
+ # 1. Install dependencies (see requirements.txt)
31
+ # 2. Run this script
32
+ # 3. Access the Gradio web interface for interactive exploration
33
+
34
+
35
+ # **Section One**
36
+
37
+ # ==============================
38
+ # SECTION 1
39
+ # ==============================
40
+ # Install
41
+ !pip install -qq git+https://github.com/openai/CLIP.git
42
+ !pip install -qq lpips
43
+ !pip install -qq bert-score
44
+ !pip install -qq transformers accelerate
45
+ !pip install -qq diffusers gradio
46
+
47
+
48
+ # Libraries
49
+ import torch
50
+ import gradio as gr
51
+ from PIL import Image
52
+ from diffusers import DiffusionPipeline
53
+ from transformers import pipeline, BlipProcessor, BlipForQuestionAnswering
54
+ import lpips
55
+ import clip
56
+ from bert_score import score
57
+ import torchvision.transforms as T
58
+
59
+ device = "cuda" if torch.cuda.is_available() else "cpu"
60
+
61
+ def free_gpu_cache():
62
+ if device == "cuda":
63
+ torch.cuda.empty_cache()
64
+
65
+ # ==============================
66
+ # MODELS
67
+ # ==============================
68
+ gen_pipe = DiffusionPipeline.from_pretrained(
69
+ "stabilityai/sdxl-turbo",
70
+ torch_dtype=torch.float16 if device=="cuda" else torch.float32
71
+ ).to(device)
72
+
73
+ dreamshaper_pipe = DiffusionPipeline.from_pretrained(
74
+ "Lykon/dreamshaper-7",
75
+ torch_dtype=torch.float16 if device=="cuda" else torch.float32
76
+ ).to(device)
77
+
78
+ captioner = pipeline(
79
+ "image-to-text",
80
+ model="Salesforce/blip-image-captioning-large",
81
+ device=0 if device=="cuda" else -1,)
82
+ #generate_kwargs={"max_new_tokens":256, "num_beams":5, "temperature":0.7})
83
+
84
+ sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english",
85
+ device=0 if device=="cuda" else -1)
86
+ ner_model = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english",
87
+ aggregation_strategy="simple", device=0 if device=="cuda" else -1)
88
+ topic_model = pipeline("zero-shot-classification", model="facebook/bart-large-mnli",
89
+ device=0 if device=="cuda" else -1)
90
+
91
+ vqa_processor = BlipProcessor.from_pretrained("Salesforce/blip-vqa-base")
92
+ vqa_model = BlipForQuestionAnswering.from_pretrained("Salesforce/blip-vqa-base").to("cpu")
93
+
94
+ clip_model, clip_preprocess = clip.load("ViT-B/32", device=device)
95
+ lpips_model = lpips.LPIPS(net='alex').to(device)
96
+ lpips_transform = T.Compose([T.ToTensor(), T.Resize((256,256))])
97
+
98
+ style_map = {
99
+ "Photorealistic": "photorealistic, ultra-detailed, 8k, cinematic lighting",
100
+ "Real Life": "natural lighting, true-to-life colors, DSLR",
101
+ "Documentary": "documentary handheld muted colors",
102
+ "iPhone Camera": "iPhone photo natural HDR",
103
+ "Street Photography": "candid street ambient shadows",
104
+ "Cinematic": "cinematic lighting dramatic depth",
105
+ "Anime": "anime cel shaded vibrant",
106
+ "Watercolor": "watercolor soft wash art",
107
+ "Macro": "macro lens shallow DOF",
108
+ "Cyberpunk": "neon cyberpunk futuristic",
109
+ }
110
+
111
+ # **Section Two**
112
+
113
+ # ==============================
114
+ # SECTION 2 — FUNCTIONS
115
+ # ==============================
116
+ def generate_image_with_enhancer(base_caption, enhancer, negative, seed, style, images):
117
+ images = images or []
118
+ base_caption = base_caption or ""
119
+ enhancer = enhancer or ""
120
+
121
+ final_prompt = f"{base_caption}, {enhancer}".strip(", ")
122
+ final_prompt = f"{final_prompt}, {style_map.get(style,'')}".strip(", ")
123
+
124
+ try:
125
+ seed = int(seed)
126
+ except:
127
+ seed = 42
128
+
129
+ generator = torch.Generator(device="cpu").manual_seed(seed)
130
+
131
+ try:
132
+ with torch.no_grad():
133
+ out = gen_pipe(prompt=final_prompt, negative_prompt=negative, generator=generator)
134
+ img = out.images[0]
135
+ except Exception as e:
136
+ print("SD Turbo failed:", e)
137
+ img = None
138
+
139
+ if img:
140
+ images.append(img)
141
+
142
+ free_gpu_cache()
143
+ return img, images
144
+
145
+ def generate_dreamshaper_with_enhancer(base_caption, enhancer, negative, seed, style, images):
146
+ images = images or []
147
+ base_caption = base_caption or ""
148
+ enhancer = enhancer or ""
149
+
150
+ final_prompt = f"{base_caption}, {enhancer}".strip(", ")
151
+ final_prompt = f"{final_prompt}, {style_map.get(style,'')}".strip(", ")
152
+
153
+ try:
154
+ seed = int(seed)
155
+ except:
156
+ seed = 42
157
+
158
+ generator = torch.Generator(device="cpu").manual_seed(seed)
159
+
160
+ try:
161
+ with torch.no_grad():
162
+ out = dreamshaper_pipe(prompt=final_prompt, negative_prompt=negative, generator=generator)
163
+ img = out.images[0]
164
+ except Exception as e:
165
+ print("DreamShaper failed:", e)
166
+ img = None
167
+
168
+ if img:
169
+ images.append(img)
170
+
171
+ free_gpu_cache()
172
+ return img, images
173
+
174
+ def caption_for_image(img):
175
+ try:
176
+ out = captioner(img)
177
+ return out[0]["generated_text"]
178
+ except:
179
+ return "Caption failed."
180
+
181
+ def answer_vqa(question, image):
182
+ if not image or not question.strip():
183
+ return "Provide image + question."
184
+ try:
185
+ inputs_raw = vqa_processor(images=image, text=question, return_tensors="pt")
186
+ inputs = {k:v.to("cpu") for k,v in inputs_raw.items()}
187
+ with torch.no_grad():
188
+ out = vqa_model(**inputs)
189
+ ans_id = out.logits.argmax(-1)
190
+ return vqa_processor.decode(ans_id[0], skip_special_tokens=True)
191
+ except:
192
+ return "VQA failed."
193
+
194
+ def compute_metrics(images, captions, i1, i2):
195
+ img1 = images[i1]
196
+ img2 = images[i2]
197
+ cap1 = captions[i1]
198
+ cap2 = captions[i2]
199
+
200
+ # CLIP
201
+ t1 = clip_preprocess(img1).unsqueeze(0).to("cpu")
202
+ t2 = clip_preprocess(img2).unsqueeze(0).to("cpu")
203
+ with torch.no_grad():
204
+ f1 = clip_model.encode_image(t1)
205
+ f2 = clip_model.encode_image(t2)
206
+ clip_sim = float(torch.cosine_similarity(f1, f2))
207
+
208
+ # LPIPS
209
+ L1 = (lpips_transform(img1).unsqueeze(0)*2 - 1)
210
+ L2 = (lpips_transform(img2).unsqueeze(0)*2 - 1)
211
+ with torch.no_grad():
212
+ lp = float(lpips_model(L1, L2))
213
+
214
+ # BERTScore
215
+ if cap1 and cap2:
216
+ _, _, F = score([cap1],[cap2], lang="en", verbose=False)
217
+ bert_f1 = float(F.mean())
218
+ else:
219
+ bert_f1 = 0.0
220
+
221
+ return clip_sim, lp, bert_f1
222
+
223
+ # **Section Three**
224
+
225
+ # ==============================
226
+ # Section Three
227
+ # ==============================
228
+
229
+ # 1
230
+ # ---------------- Build Gradio UI with Custom Look ----------------
231
+ def build_ui_with_custom_ui():
232
+ with gr.Blocks(title="Multimodal AI Image Studio") as demo:
233
+
234
+ # ---------------- CSS Styling ----------------
235
+ gr.HTML("""
236
+ <style>
237
+ .heading-orange h2, .heading-orange h3 { color: #ff5500 !important; }
238
+ .orange-btn button {
239
+ background-color: #ff5500 !important;
240
+ color: white !important;
241
+ border-radius: 6px !important;
242
+ height: 36px !important;
243
+ font-weight: bold;
244
+ }
245
+ .teal-btn button {
246
+ background-color: #008080 !important;
247
+ color: white !important;
248
+ border-radius: 6px !important;
249
+ height: 40px !important;
250
+ font-weight: bold;
251
+ }
252
+
253
+ /* Horizontal thin spinner */
254
+ .loading-line {
255
+ height: 4px;
256
+ background: linear-gradient(90deg, #008080 0%, #00cccc 50%, #008080 100%);
257
+ background-size: 200% 100%;
258
+ animation: loading 1s linear infinite;
259
+ }
260
+ @keyframes loading {
261
+ 0% { background-position: 200% 0; }
262
+ 100% { background-position: -200% 0; }
263
+ }
264
+
265
+ /* Match enhancer box to upload button */
266
+ .enhancer-box textarea {
267
+ width: 100% !important;
268
+ height: 36px !important;
269
+ box-sizing: border-box;
270
+ font-size: 14px;
271
+ }
272
+
273
+ /* Equal-height styling for Step-1 columns */
274
+ .equal-height-row {
275
+ display: flex;
276
+ align-items: stretch;
277
+ }
278
+ .equal-height-row > .gr-column {
279
+ display: flex;
280
+ flex-direction: column;
281
+ }
282
+
283
+ /* Target Gradio image container */
284
+ .stretch-img .gr-image-container {
285
+ flex-grow: 1;
286
+ display: flex;
287
+ }
288
+
289
+ .stretch-img .gr-image-container img {
290
+ width: 100% !important;
291
+ height: 100% !important;
292
+ object-fit: contain; /* or cover */
293
+ }
294
+
295
+
296
+
297
+ </style>
298
+ """)
299
+
300
+ # ---------------- Heading ----------------
301
+ gr.Markdown(
302
+ "## Multimodal AI Image Studio: An Integrated Comparative Perspective",
303
+ elem_classes="heading-orange"
304
+ )
305
+
306
+ # ---------------- States ----------------
307
+ images_state = gr.State([])
308
+ captions_state = gr.State([])
309
+
310
+ # ---------------- Step 1: Upload Reference Image ----------------
311
+ gr.Markdown("### Upload Reference Image", elem_classes="heading-orange")
312
+
313
+ with gr.Row(elem_classes="equal-height-row"):
314
+ with gr.Column(scale=1):
315
+ upload_input = gr.Image(label="Drag & Drop Image", type="pil")
316
+ upload_btn = gr.Button(
317
+ "Upload Image & Generate Caption",
318
+ elem_classes="orange-btn"
319
+ )
320
+
321
+ with gr.Column(scale=1):
322
+ upload_preview = gr.Image(
323
+ label="Uploaded Image",
324
+ interactive=False, elem_classes="stretch-img"
325
+ )
326
+
327
+ enhancer_box = gr.Textbox(
328
+ label="Add Prompt Enhancer (Optional)",
329
+ placeholder="Example: 'at night with neon lights', 'wearing a red jacket', etc.",
330
+ elem_classes="enhancer-box"
331
+ )
332
+
333
+ caption_out = gr.Markdown(label="Generated Caption")
334
+
335
+ # ---------------- Robust Captioning ----------------
336
+ def upload_and_generate_caption_ui(img, images_state, captions_state):
337
+ if img is None:
338
+ return None, "No image uploaded.", [], []
339
+
340
+ images = [img]
341
+ try:
342
+ output = captioner(img)
343
+ caption = (
344
+ output[0]["generated_text"]
345
+ if len(output) > 0 and "generated_text" in output[0]
346
+ else "Caption failed."
347
+ )
348
+ except Exception as e:
349
+ print("Captioning error:", e)
350
+ caption = "Caption failed."
351
+
352
+ captions = [caption]
353
+ return img, caption, images, captions
354
+
355
+ upload_btn.click(
356
+ upload_and_generate_caption_ui,
357
+ inputs=[upload_input, images_state, captions_state],
358
+ outputs=[upload_preview, caption_out, images_state, captions_state]
359
+ )
360
+
361
+ # ---------------- Step 2: Generate SD-Turbo & DreamShaper ----------------
362
+ gr.Markdown("### Generate Images from Caption", elem_classes="heading-orange")
363
+
364
+ with gr.Row():
365
+ with gr.Column(scale=1, min_width=300):
366
+ sd_btn = gr.Button(
367
+ "Generate SD-Turbo Image",
368
+ elem_classes="orange-btn"
369
+ )
370
+ sd_preview = gr.Image(
371
+ label="SD-Turbo Image",
372
+ interactive=False
373
+ )
374
+
375
+ with gr.Column(scale=1, min_width=300):
376
+ ds_btn = gr.Button(
377
+ "Generate DreamShaper Image",
378
+ elem_classes="orange-btn"
379
+ )
380
+ ds_preview = gr.Image(
381
+ label="DreamShaper Image",
382
+ interactive=False
383
+ )
384
+
385
+ def generate_sd_from_caption_ui(caption, enhancer, images_state, captions_state):
386
+ final_prompt = f"{caption}, {enhancer}".strip(", ")
387
+ img, images = generate_image_with_enhancer(
388
+ final_prompt,
389
+ enhancer="",
390
+ negative="",
391
+ seed=42,
392
+ style="Photorealistic",
393
+ images=images_state
394
+ )
395
+ try:
396
+ generated_caption = captioner(img)[0]["generated_text"]
397
+ except:
398
+ generated_caption = "Caption failed."
399
+
400
+ captions_state[1:2] = [generated_caption]
401
+ return img, images, captions_state
402
+
403
+ def generate_ds_from_caption_ui(caption, enhancer, images_state, captions_state):
404
+ final_prompt = f"{caption}, {enhancer}".strip(", ")
405
+ img, images = generate_dreamshaper_with_enhancer(
406
+ final_prompt,
407
+ enhancer="",
408
+ negative="",
409
+ seed=123,
410
+ style="Photorealistic",
411
+ images=images_state
412
+ )
413
+ try:
414
+ generated_caption = captioner(img)[0]["generated_text"]
415
+ except:
416
+ generated_caption = "Caption failed."
417
+
418
+ captions_state[2:3] = [generated_caption]
419
+ return img, images, captions_state
420
+
421
+ sd_btn.click(
422
+ generate_sd_from_caption_ui,
423
+ inputs=[caption_out, enhancer_box, images_state, captions_state],
424
+ outputs=[sd_preview, images_state, captions_state]
425
+ )
426
+
427
+ ds_btn.click(
428
+ generate_ds_from_caption_ui,
429
+ inputs=[caption_out, enhancer_box, images_state, captions_state],
430
+ outputs=[ds_preview, images_state, captions_state]
431
+ )
432
+
433
+ # ---------------- Step 3: Compute Pairwise Metrics ----------------
434
+ gr.Markdown("### Compute Pairwise Metrics", elem_classes="heading-orange")
435
+
436
+ metrics_btn = gr.Button(
437
+ "Compute Metrics for All Pairs",
438
+ elem_classes="teal-btn"
439
+ )
440
+
441
+ with gr.Row():
442
+ metrics_A = gr.Markdown()
443
+ metrics_B = gr.Markdown()
444
+ metrics_C = gr.Markdown()
445
+
446
+ def compute_metrics_all_pairs_ui(images, captions):
447
+ yield (
448
+ "<div class='loading-line'></div>",
449
+ "<div class='loading-line'></div>",
450
+ "<div class='loading-line'></div>"
451
+ )
452
+
453
+ if len(images) < 3:
454
+ msg = "All three images and captions are required to compute metrics."
455
+ yield msg, msg, msg
456
+ else:
457
+ A = compute_metrics(images, captions, 0, 1)
458
+ B = compute_metrics(images, captions, 0, 2)
459
+ C = compute_metrics(images, captions, 1, 2)
460
+ yield (
461
+ f"**Reference ↔ SD-Turbo**\n{A}",
462
+ f"**Reference ↔ DreamShaper**\n{B}",
463
+ f"**SD-Turbo ↔ DreamShaper**\n{C}"
464
+ )
465
+
466
+ metrics_btn.click(
467
+ compute_metrics_all_pairs_ui,
468
+ inputs=[images_state, captions_state],
469
+ outputs=[metrics_A, metrics_B, metrics_C]
470
+ )
471
+
472
+ # ---------------- Step 4: NLP Analysis ----------------
473
+ gr.Markdown("### NLP Analysis of Captions", elem_classes="heading-orange")
474
+
475
+ nlp_btn = gr.Button(
476
+ "Analyze Captions",
477
+ elem_classes="teal-btn"
478
+ )
479
+
480
+ nlp_out = gr.HTML()
481
+
482
+ def analyze_caption_pipeline_ui(captions):
483
+ yield "<div class='loading-line'></div>"
484
+
485
+ if len(captions) < 3:
486
+ yield "<b>All three captions are required for NLP analysis.</b>"
487
+ else:
488
+ labels = ["Reference Image", "SD-Turbo", "DreamShaper"]
489
+ blocks = []
490
+
491
+ for label, caption in zip(labels, captions):
492
+ sentiment = "<br>".join(
493
+ [f"{s['label']}: {s['score']:.2f}"
494
+ for s in sentiment_model(caption)]
495
+ )
496
+
497
+ ents = (
498
+ "<br>".join(
499
+ [f"{e['entity_group']}: {e['word']}"
500
+ for e in ner_model(caption)]
501
+ ) or "None"
502
+ )
503
+
504
+ topics_data = topic_model(
505
+ caption,
506
+ candidate_labels=[
507
+ "people", "animals", "objects", "food", "nature"
508
+ ]
509
+ )
510
+
511
+ topics = "<br>".join(
512
+ [f"{l}: {sc:.2f}"
513
+ for l, sc in zip(
514
+ topics_data["labels"],
515
+ topics_data["scores"]
516
+ )]
517
+ )
518
+
519
+ block = f"""
520
+ <div style='flex:1;padding:10px;min-width:250px;'>
521
+ <h3><u>{label}</u></h3>
522
+ <b>Sentiment</b><br>{sentiment}<br><br>
523
+ <b>Entities</b><br>{ents}<br><br>
524
+ <b>Topics</b><br>{topics}
525
+ </div>
526
+ """
527
+ blocks.append(block)
528
+
529
+ yield (
530
+ "<div style='display:flex; gap:20px; justify-content:space-between;'>"
531
+ + "".join(blocks) +
532
+ "</div>"
533
+ )
534
+
535
+ nlp_btn.click(
536
+ analyze_caption_pipeline_ui,
537
+ inputs=[captions_state],
538
+ outputs=[nlp_out]
539
+ )
540
+
541
+ # ---------------- Step 5: Visual Question Answering ----------------
542
+ gr.Markdown("### Visual Question Answering (VQA)", elem_classes="heading-orange")
543
+
544
+ with gr.Row():
545
+ with gr.Column(scale=1):
546
+ vqa_input = gr.Textbox(
547
+ label="Enter a question about the reference image"
548
+ )
549
+ vqa_btn = gr.Button(
550
+ "Get Answer",
551
+ elem_classes="teal-btn"
552
+ )
553
+
554
+ with gr.Column(scale=1):
555
+ vqa_out = gr.Markdown(label="VQA Output")
556
+
557
+ def answer_vqa_ui(question, image):
558
+ yield "<div class='loading-line'></div>"
559
+ ans = answer_vqa(question, image)
560
+ yield ans
561
+
562
+ vqa_btn.click(
563
+ answer_vqa_ui,
564
+ inputs=[vqa_input, upload_preview],
565
+ outputs=[vqa_out]
566
+ )
567
+
568
+ return demo
569
+
570
+
571
+ # ---------------- Launch ----------------
572
+ demo = build_ui_with_custom_ui()
573
+ demo.launch()
574
+