Update app.py
Browse files
app.py
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
import gradio as gr
|
| 4 |
import numpy as np
|
|
@@ -89,10 +89,10 @@ examples = [
|
|
| 89 |
|
| 90 |
with gr.Blocks() as demo:
|
| 91 |
gr.Markdown("""
|
| 92 |
-
#
|
| 93 |
-
The demo below showcases the
|
| 94 |
Given an image and a description of an object or region,
|
| 95 |
-
|
| 96 |
This is done through visual prompting, where instead of reasoning with text, the VLM reasons over images annotated with sampled points,
|
| 97 |
in order to pick the best points.
|
| 98 |
In each iteration, we take the points previously selected by the VLM, resample new points around the their mean, and repeat the process.
|
|
@@ -104,16 +104,16 @@ This demo uses GPT-4V, so it requires an OpenAI API key.
|
|
| 104 |
To use the provided example images, you can right click on the image -> copy image, then click the clipboard icon in the Input Image box.
|
| 105 |
|
| 106 |
Hyperparameters to set:
|
| 107 |
-
* N Samples for Initialization - how many initial points are sampled for the first
|
| 108 |
* N Samples for Optimiazation - how many points are sampled for subsequent iterations.
|
| 109 |
* N Iterations - how many optimization iterations to perform.
|
| 110 |
-
* N Ensemble Recursions - how many ensembles for recursive
|
| 111 |
|
| 112 |
Note that each iteration takes about ~10s, and each additional ensemble adds a multiple number of N Iterations.
|
| 113 |
|
| 114 |
-
After
|
| 115 |
-
There are two images for each iteration - the first one shows all the sampled points, and the second one shows which one
|
| 116 |
-
The Info textbox will show the final selected pixel coordinate that
|
| 117 |
""".strip())
|
| 118 |
|
| 119 |
gr.Markdown(
|
|
|
|
| 1 |
+
"""PIVOT Demo."""
|
| 2 |
|
| 3 |
import gradio as gr
|
| 4 |
import numpy as np
|
|
|
|
| 89 |
|
| 90 |
with gr.Blocks() as demo:
|
| 91 |
gr.Markdown("""
|
| 92 |
+
# PIVOT: Prompting with Iterative Visual Optimization
|
| 93 |
+
The demo below showcases a version of the PIVOT algorithm, which uses iterative visual prompts to optimize and guide the reasoning of Vision-Langauge-Models (VLMs).
|
| 94 |
Given an image and a description of an object or region,
|
| 95 |
+
PIVOT iteratively searches for the point in the image that best corresponds to the description.
|
| 96 |
This is done through visual prompting, where instead of reasoning with text, the VLM reasons over images annotated with sampled points,
|
| 97 |
in order to pick the best points.
|
| 98 |
In each iteration, we take the points previously selected by the VLM, resample new points around the their mean, and repeat the process.
|
|
|
|
| 104 |
To use the provided example images, you can right click on the image -> copy image, then click the clipboard icon in the Input Image box.
|
| 105 |
|
| 106 |
Hyperparameters to set:
|
| 107 |
+
* N Samples for Initialization - how many initial points are sampled for the first PIVOT iteration.
|
| 108 |
* N Samples for Optimiazation - how many points are sampled for subsequent iterations.
|
| 109 |
* N Iterations - how many optimization iterations to perform.
|
| 110 |
+
* N Ensemble Recursions - how many ensembles for recursive PIVOT.
|
| 111 |
|
| 112 |
Note that each iteration takes about ~10s, and each additional ensemble adds a multiple number of N Iterations.
|
| 113 |
|
| 114 |
+
After PIVOT finishes, the image gallery below will visualize PIVOT results throughout all the iterations.
|
| 115 |
+
There are two images for each iteration - the first one shows all the sampled points, and the second one shows which one PIVOT picked.
|
| 116 |
+
The Info textbox will show the final selected pixel coordinate that PIVOT converged to.
|
| 117 |
""".strip())
|
| 118 |
|
| 119 |
gr.Markdown(
|