derektan commited on
Commit
1496342
·
1 Parent(s): 90eae67

Updated description

Browse files
Files changed (1) hide show
  1. app.py +11 -11
app.py CHANGED
@@ -158,12 +158,12 @@ model.eval()
158
  # Gradio
159
  examples = [
160
  [
161
- "Where can I find the shore birds (Animalia Chordata Aves Charadriiformes Laridae Larus marinus) in this image? Please output segmentation mask and explain why.",
162
  "./imgs/examples/Animalia_Chordata_Aves_Charadriiformes_Laridae_Larus_marinus/80645_39.76079_-74.10316.jpg",
 
163
  ],
164
  [
165
- "Where can I find the capybaras (Animalia Chordata Mammalia Rodentia Caviidae Hydrochoerus hydrochaeris) in this image? Please output segmentation mask.",
166
  "./imgs/examples/Animalia_Chordata_Mammalia_Rodentia_Caviidae_Hydrochoerus_hydrochaeris/28871_-12.80255_-69.29999.jpg",
 
167
  ],
168
  ]
169
  output_labels = ["Segmentation Output"]
@@ -172,14 +172,14 @@ title = "LISA-AVS: LISA 7B Model Finetuned on AVS-Bench Dataset"
172
 
173
  description = """
174
  <font size=4>
175
- Note: This is an adapted version of the online demo for LISA, where we finetune from scratch the LISA model (7B) with data from AVS-Bench (Search-TTA). \n
176
- If multiple users are using it at the same time, they will enter a queue, which may delay some time. \n
177
- **Note**: **Different prompts can lead to significantly varied results**. \n
178
- **Note**: Please try to **standardize** your input text prompts to **avoid ambiguity**, and also pay attention to whether the **punctuations** of the input are correct. \n
179
  **Usage**: <br>
180
- &ensp;(1) To let LISA-AVS **segment something**, input prompt like: "Where can I find the <Common Name> (<Full Taxonomy Name>) in this image? Please output segmentation mask."; <br>
181
- &ensp;(2) To let LISA-AVS **output an explanation**, input prompt like: "Where can I find the <Common Name> (<Full Taxonomy Name>) in this image? Please output segmentation mask and explain why."; <br>
182
- &ensp;(3) To obtain **solely language output**, you can input like what you should do in current multi-modal LLM (e.g., LLaVA), like: "Where can I find the <Common Name> (<Full Taxonomy Name>) in this image?" <br>
183
 
184
  </font>
185
  """
@@ -202,7 +202,7 @@ AVS-Bench
202
 
203
  ## to be implemented
204
  @spaces.GPU
205
- def inference(input_str, input_image):
206
  ## filter out special chars
207
  input_str = bleach.clean(input_str)
208
 
@@ -338,8 +338,8 @@ def inference(input_str, input_image):
338
  demo = gr.Interface(
339
  inference,
340
  inputs=[
341
- gr.Textbox(lines=1, placeholder=None, label="Text Instruction"),
342
  gr.Image(type="filepath", label="Input Image"),
 
343
  ],
344
  outputs=[
345
  gr.Image(type="pil", label="Segmentation Output"),
 
158
  # Gradio
159
  examples = [
160
  [
 
161
  "./imgs/examples/Animalia_Chordata_Aves_Charadriiformes_Laridae_Larus_marinus/80645_39.76079_-74.10316.jpg",
162
+ "Where can I find the shore birds (Animalia Chordata Aves Charadriiformes Laridae Larus marinus) in this image? Please output segmentation mask and explain why.",
163
  ],
164
  [
 
165
  "./imgs/examples/Animalia_Chordata_Mammalia_Rodentia_Caviidae_Hydrochoerus_hydrochaeris/28871_-12.80255_-69.29999.jpg",
166
+ "Where can I find the capybaras (Animalia Chordata Mammalia Rodentia Caviidae Hydrochoerus hydrochaeris) in this image? Please output segmentation mask.",
167
  ],
168
  ]
169
  output_labels = ["Segmentation Output"]
 
172
 
173
  description = """
174
  <font size=4>
175
+ This is an adapted version of the online demo for <a href='https://github.com/dvlab-research/LISA' target='_blank'>LISA</a>, where we finetune from scratch the LISA model (7B) with data from <a href='https://search-tta.github.io/' target='_blank'>AVS-Bench (Search-TTA)</a>. \n
176
+ **Note**: <br>
177
+ &ensp;(a) If multiple users are using it at the same time, they will enter a queue, which may delay some time. \n
178
+ &ensp;(b) Different prompts can lead to significantly varied results. Please **standardize** your input text prompts to **avoid ambiguity**, and pay attention to whether the **punctuations** of the input are correct. \n
179
  **Usage**: <br>
180
+ &ensp;(1) To let LISA-AVS **segment something**, input prompt like: "Where can I find the <em>Common Name</em> (<em>Full Taxonomy Name</em>) in this image? Please output segmentation mask."; <br>
181
+ &ensp;(2) To let LISA-AVS **output an explanation**, input prompt like: "Where can I find the <em>Common Name</em> (<em>Full Taxonomy Name</em>) in this image? Please output segmentation mask and explain why."; <br>
182
+ &ensp;(3) To obtain **solely language output**, you can input like what you should do in current multi-modal LLM (e.g., LLaVA), like: "Where can I find the <em>Common Name</em> (<em>Full Taxonomy Name</em>) in this image?" <br>
183
 
184
  </font>
185
  """
 
202
 
203
  ## to be implemented
204
  @spaces.GPU
205
+ def inference(input_image, input_str):
206
  ## filter out special chars
207
  input_str = bleach.clean(input_str)
208
 
 
338
  demo = gr.Interface(
339
  inference,
340
  inputs=[
 
341
  gr.Image(type="filepath", label="Input Image"),
342
+ gr.Textbox(lines=1, placeholder=None, label="Text Instruction"),
343
  ],
344
  outputs=[
345
  gr.Image(type="pil", label="Segmentation Output"),