graph-based-captions
/

GBC10M-PromptGen-200M

Text Generation

text-generation-inference

Model card Files Files and versions

yhsieh commited on Dec 20, 2024

Commit

23f9bed

·

verified ·

1 Parent(s): 36c1525

Create README.md

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+---
+license: apple-ascl
+datasets:
+- graph-based-captions/GBC10M
+language:
+- en
+---
+### Graph-based captioning (GBC) is a new image annotation paradigm that combines the strengths of long captions, region captions, and scene graphs
+GBC interconnects region captions to create a unified description akin to a long caption, while also providing structural information similar to scene graphs.
+![assets/GBC_illustration.png](assets/GBC_illustration.png)
+### Text-to-Image with GBC as Middleware
+We propose to use GBC as middleware for text-to-image generation. This repository provides model for generating GBC annotation from simple text prompt.
+![assets/GBC_promptgen.png](assets/GBC_promptgen.png)
+For futher detail on how to use the model please refer to the [accompanying code repository](https://github.com/apple/ml-gbc?tab=readme-ov-file#-gbc-text-to-image).
+### License
+For license please checkout the [LICENSE](LICENSE) file.
+### Citation
+```
+@article{GBC2024,
+  title={Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions},
+  author={Yu-Guan Hsieh and Cheng-Yu Hsieh and Shih-Ying Yeh and Louis Béthune and Hadi Pouransari and Pavan Kumar Anasosalu Vasu and Chun-Liang Li and Ranjay Krishna and Oncel Tuzel and Marco Cuturi},
+  journal={arXiv preprint arXiv:2407.06723},
+  year={2024}
+}