Update README.md
Browse files
README.md
CHANGED
|
@@ -16,6 +16,13 @@ language:
|
|
| 16 |
- en
|
| 17 |
|
| 18 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
# QuickStart
|
| 21 |
|
|
@@ -112,7 +119,7 @@ print(vqa_model.vqa_multiple_choice(question, image, choices))
|
|
| 112 |
## Bibtex
|
| 113 |
```
|
| 114 |
@article{hu2022promptcap,
|
| 115 |
-
title={PromptCap: Prompt-Guided Image Captioning
|
| 116 |
author={Hu, Yushi and Hua, Hang and Yang, Zhengyuan and Shi, Weijia and Smith, Noah A and Luo, Jiebo},
|
| 117 |
journal={arXiv preprint arXiv:2211.09699},
|
| 118 |
year={2022}
|
|
|
|
| 16 |
- en
|
| 17 |
|
| 18 |
---
|
| 19 |
+
This is the repo for the paper [PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3](https://arxiv.org/abs/2211.09699)
|
| 20 |
+
|
| 21 |
+
We introduce PromptCap, a captioning model that can be controlled by natural language instruction. The instruction may contain a question that the user is interested in.
|
| 22 |
+
For example, "what is the boy putting on?". PromptCap also supports generic caption, using the question "what does the image describe?"
|
| 23 |
+
|
| 24 |
+
PromptCap can be served as a light-weight visual plug-in for LLM like GPT-3 and ChatGPT. It achieves SOTA performance on COCO captioning (150 CIDEr).
|
| 25 |
+
When paired with GPT-3, and conditioned on user question, PromptCap get SOTA performance on knowledge-based VQA tasks (60.4% on OK-VQA and 59.6% on A-OKVQA)
|
| 26 |
|
| 27 |
# QuickStart
|
| 28 |
|
|
|
|
| 119 |
## Bibtex
|
| 120 |
```
|
| 121 |
@article{hu2022promptcap,
|
| 122 |
+
title={PromptCap: Prompt-Guided Task-Aware Image Captioning},
|
| 123 |
author={Hu, Yushi and Hua, Hang and Yang, Zhengyuan and Shi, Weijia and Smith, Noah A and Luo, Jiebo},
|
| 124 |
journal={arXiv preprint arXiv:2211.09699},
|
| 125 |
year={2022}
|