Spaces:
Running
on
Zero
Running
on
Zero
derektan
commited on
Commit
·
1496342
1
Parent(s):
90eae67
Updated description
Browse files
app.py
CHANGED
|
@@ -158,12 +158,12 @@ model.eval()
|
|
| 158 |
# Gradio
|
| 159 |
examples = [
|
| 160 |
[
|
| 161 |
-
"Where can I find the shore birds (Animalia Chordata Aves Charadriiformes Laridae Larus marinus) in this image? Please output segmentation mask and explain why.",
|
| 162 |
"./imgs/examples/Animalia_Chordata_Aves_Charadriiformes_Laridae_Larus_marinus/80645_39.76079_-74.10316.jpg",
|
|
|
|
| 163 |
],
|
| 164 |
[
|
| 165 |
-
"Where can I find the capybaras (Animalia Chordata Mammalia Rodentia Caviidae Hydrochoerus hydrochaeris) in this image? Please output segmentation mask.",
|
| 166 |
"./imgs/examples/Animalia_Chordata_Mammalia_Rodentia_Caviidae_Hydrochoerus_hydrochaeris/28871_-12.80255_-69.29999.jpg",
|
|
|
|
| 167 |
],
|
| 168 |
]
|
| 169 |
output_labels = ["Segmentation Output"]
|
|
@@ -172,14 +172,14 @@ title = "LISA-AVS: LISA 7B Model Finetuned on AVS-Bench Dataset"
|
|
| 172 |
|
| 173 |
description = """
|
| 174 |
<font size=4>
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
**Usage**: <br>
|
| 180 |
-
 (1) To let LISA-AVS **segment something**, input prompt like: "Where can I find the <Common Name> (<Full Taxonomy Name>) in this image? Please output segmentation mask."; <br>
|
| 181 |
-
 (2) To let LISA-AVS **output an explanation**, input prompt like: "Where can I find the <Common Name> (<Full Taxonomy Name>) in this image? Please output segmentation mask and explain why."; <br>
|
| 182 |
-
 (3) To obtain **solely language output**, you can input like what you should do in current multi-modal LLM (e.g., LLaVA), like: "Where can I find the <Common Name> (<Full Taxonomy Name>) in this image?" <br>
|
| 183 |
|
| 184 |
</font>
|
| 185 |
"""
|
|
@@ -202,7 +202,7 @@ AVS-Bench
|
|
| 202 |
|
| 203 |
## to be implemented
|
| 204 |
@spaces.GPU
|
| 205 |
-
def inference(
|
| 206 |
## filter out special chars
|
| 207 |
input_str = bleach.clean(input_str)
|
| 208 |
|
|
@@ -338,8 +338,8 @@ def inference(input_str, input_image):
|
|
| 338 |
demo = gr.Interface(
|
| 339 |
inference,
|
| 340 |
inputs=[
|
| 341 |
-
gr.Textbox(lines=1, placeholder=None, label="Text Instruction"),
|
| 342 |
gr.Image(type="filepath", label="Input Image"),
|
|
|
|
| 343 |
],
|
| 344 |
outputs=[
|
| 345 |
gr.Image(type="pil", label="Segmentation Output"),
|
|
|
|
| 158 |
# Gradio
|
| 159 |
examples = [
|
| 160 |
[
|
|
|
|
| 161 |
"./imgs/examples/Animalia_Chordata_Aves_Charadriiformes_Laridae_Larus_marinus/80645_39.76079_-74.10316.jpg",
|
| 162 |
+
"Where can I find the shore birds (Animalia Chordata Aves Charadriiformes Laridae Larus marinus) in this image? Please output segmentation mask and explain why.",
|
| 163 |
],
|
| 164 |
[
|
|
|
|
| 165 |
"./imgs/examples/Animalia_Chordata_Mammalia_Rodentia_Caviidae_Hydrochoerus_hydrochaeris/28871_-12.80255_-69.29999.jpg",
|
| 166 |
+
"Where can I find the capybaras (Animalia Chordata Mammalia Rodentia Caviidae Hydrochoerus hydrochaeris) in this image? Please output segmentation mask.",
|
| 167 |
],
|
| 168 |
]
|
| 169 |
output_labels = ["Segmentation Output"]
|
|
|
|
| 172 |
|
| 173 |
description = """
|
| 174 |
<font size=4>
|
| 175 |
+
This is an adapted version of the online demo for <a href='https://github.com/dvlab-research/LISA' target='_blank'>LISA</a>, where we finetune from scratch the LISA model (7B) with data from <a href='https://search-tta.github.io/' target='_blank'>AVS-Bench (Search-TTA)</a>. \n
|
| 176 |
+
**Note**: <br>
|
| 177 |
+
 (a) If multiple users are using it at the same time, they will enter a queue, which may delay some time. \n
|
| 178 |
+
 (b) Different prompts can lead to significantly varied results. Please **standardize** your input text prompts to **avoid ambiguity**, and pay attention to whether the **punctuations** of the input are correct. \n
|
| 179 |
**Usage**: <br>
|
| 180 |
+
 (1) To let LISA-AVS **segment something**, input prompt like: "Where can I find the <em>Common Name</em> (<em>Full Taxonomy Name</em>) in this image? Please output segmentation mask."; <br>
|
| 181 |
+
 (2) To let LISA-AVS **output an explanation**, input prompt like: "Where can I find the <em>Common Name</em> (<em>Full Taxonomy Name</em>) in this image? Please output segmentation mask and explain why."; <br>
|
| 182 |
+
 (3) To obtain **solely language output**, you can input like what you should do in current multi-modal LLM (e.g., LLaVA), like: "Where can I find the <em>Common Name</em> (<em>Full Taxonomy Name</em>) in this image?" <br>
|
| 183 |
|
| 184 |
</font>
|
| 185 |
"""
|
|
|
|
| 202 |
|
| 203 |
## to be implemented
|
| 204 |
@spaces.GPU
|
| 205 |
+
def inference(input_image, input_str):
|
| 206 |
## filter out special chars
|
| 207 |
input_str = bleach.clean(input_str)
|
| 208 |
|
|
|
|
| 338 |
demo = gr.Interface(
|
| 339 |
inference,
|
| 340 |
inputs=[
|
|
|
|
| 341 |
gr.Image(type="filepath", label="Input Image"),
|
| 342 |
+
gr.Textbox(lines=1, placeholder=None, label="Text Instruction"),
|
| 343 |
],
|
| 344 |
outputs=[
|
| 345 |
gr.Image(type="pil", label="Segmentation Output"),
|