tifa-benchmark
/

promptcap-coco-vqa

visual-question-answering

image-captioning

Model card Files Files and versions

yushihu commited on Apr 20, 2023

Commit

4101b50

·

1 Parent(s): 44aa161

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -16,12 +16,13 @@ language:
 - en
 ---
-This is the repo for the paper [PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3](https://arxiv.org/abs/2211.09699)
 We introduce PromptCap, a captioning model that can be controlled by natural language instruction. The instruction may contain a question that the user is interested in.
 For example, "what is the boy putting on?". PromptCap also supports generic caption, using the question "what does the image describe?"
-PromptCap can be served as a light-weight visual plug-in for LLM like GPT-3 and ChatGPT. It achieves SOTA performance on COCO captioning (150 CIDEr).
 When paired with GPT-3, and conditioned on user question, PromptCap get SOTA performance on knowledge-based VQA tasks (60.4% on OK-VQA and 59.6% on A-OKVQA)
 # QuickStart

 - en
 ---
+This is the repo for the paper [PromptCap: Prompt-Guided Task-Aware Image Captioning](https://arxiv.org/abs/2211.09699)
 We introduce PromptCap, a captioning model that can be controlled by natural language instruction. The instruction may contain a question that the user is interested in.
 For example, "what is the boy putting on?". PromptCap also supports generic caption, using the question "what does the image describe?"
+PromptCap can serve as a light-weight visual plug-in (much faster than BLIP-2) for LLM like GPT-3, ChatGPT, and other foundation models like Segment Anything and DINO.
+It achieves SOTA performance on COCO captioning (150 CIDEr).
 When paired with GPT-3, and conditioned on user question, PromptCap get SOTA performance on knowledge-based VQA tasks (60.4% on OK-VQA and 59.6% on A-OKVQA)
 # QuickStart