graph-based-captions
/

GBC10M-PromptGen-200M

Text Generation

text-generation-inference

Model card Files Files and versions

GBC10M-PromptGen-200M / README.md

nielsr's picture

nielsr HF Staff

Add library_name, pipeline_tag and project page

6a249fa verified 11 months ago

|

1.46 kB

	---
	datasets:
	- graph-based-captions/GBC10M
	language:
	- en
	license: apple-ascl
	library_name: transformers
	pipeline_tag: text-generation
	---

	### Graph-based captioning (GBC) is a new image annotation paradigm that combines the strengths of long captions, region captions, and scene graphs

	GBC interconnects region captions to create a unified description akin to a long caption, while also providing structural information similar to scene graphs.
	![assets/GBC_illustration.png](assets/GBC_illustration.png)

	### Text-to-Image with GBC as Middleware

	We propose to use GBC as middleware for text-to-image generation. This repository provides a model for generating GBC annotation from a simple text prompt.
	![assets/GBC_promptgen.png](assets/GBC_promptgen.png)

	For futher detail on how to use the model please refer to the [accompanying code repository](https://github.com/apple/ml-gbc?tab=readme-ov-file#-gbc-text-to-image).

	### License

	For license please checkout the [LICENSE](LICENSE) file.

	### Citation

	```
	@article{GBC2024,
	title={Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions},
	author={Yu-Guan Hsieh and Cheng-Yu Hsieh and Shih-Ying Yeh and Louis Béthune and Hadi Pouransari and Pavan Kumar Anasosalu Vasu and Chun-Liang Li and Ranjay Krishna and Oncel Tuzel and Marco Cuturi},
	journal={arXiv preprint arXiv:2407.06723},
	year={2024}
	}
	```

	[Project page](https://huggingface.co/graph-based-captions)