Update README.md
Browse files
README.md
CHANGED
|
@@ -1,14 +1,10 @@
|
|
| 1 |
# GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
|
| 2 |
<div align="center">
|
| 3 |
-
<
|
| 4 |
-
<a href="https://arxiv.org/abs/xxxx"><img src="https://img.shields.io/badge/ArXiv-xxxx-red"></a>
|
| 5 |
-
<img src="https://visitor-badge.laobi.icu/badge?page_id=rongyaofang/GoT" alt="visitors">
|
| 6 |
-
|
| 7 |
-
[Rongyao Fang](https://scholar.google.com/citations?user=FtH3CW4AAAAJ&hl=en)<sup>1\*</sup>, [Chengqi Duan](https://scholar.google.com/citations?user=r9qb4ZwAAAAJ&hl=zh-CN)<sup>2\*</sup>, [Kun Wang]()<sup>3</sup>, [Linjiang Huang](https://leonhlj.github.io/)<sup>6</sup>, [Hao Li](https://scholar.google.com/citations?user=qHqQsY4AAAAJ&hl=zh-CN)<sup>1,4</sup>, [Shilin Yan](https://scholar.google.com/citations?user=2VhjOykAAAAJ&hl=zh-CN), [Hao Tian]()<sup>3</sup>, [Xingyu Zeng]()<sup>3</sup>, [Rui Zhao]()<sup>3</sup>, [Jifeng Dai](https://jifengdai.org/)<sup>4,5</sup>, [Xihui Liu](https://xh-liu.github.io/)<sup>2 :envelope:</sup>, [Hongsheng Li](https://www.ee.cuhk.edu.hk/~hsli/)<sup>1 :envelope:</sup>
|
| 8 |
|
| 9 |
<sup>1</sup>CUHK MMLab, <sup>2</sup>HKU MMLab, <sup>3</sup>SenseTime, <sup>4</sup>Shanghai AI Laboratory, <sup>5</sup>Tsinghua University, <sup>6</sup>Beihang University
|
| 10 |
|
| 11 |
-
*Equal contribution
|
| 12 |
</div>
|
| 13 |
|
| 14 |
<div align="center" style="line-height: 1.2;">
|
|
@@ -110,19 +106,6 @@ Our approach also demonstrates superior performance on image editing benchmarks:
|
|
| 110 |
|
| 111 |
</div>
|
| 112 |
|
| 113 |
-
### Interactive Generation
|
| 114 |
-
|
| 115 |
-
One of the unique capabilities of GoT is interactive generation, allowing users to modify the reasoning chain to customize the generated images:
|
| 116 |
-
|
| 117 |
-
<div align="center">
|
| 118 |
-
<img src="figures/interactive.png" width="100%" alt="Interactive Generation" />
|
| 119 |
-
</div>
|
| 120 |
-
|
| 121 |
-
Users can interact with the reasoning chain to:
|
| 122 |
-
1. Replace objects
|
| 123 |
-
2. Adjust object positions
|
| 124 |
-
3. Modify object attributes
|
| 125 |
-
|
| 126 |
## Usage
|
| 127 |
|
| 128 |
### Dependencies
|
|
|
|
| 1 |
# GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
|
| 2 |
<div align="center">
|
| 3 |
+
[Rongyao Fang](https://scholar.google.com/citations?user=FtH3CW4AAAAJ&hl=en)<sup>1\*</sup>, [Chengqi Duan](https://scholar.google.com/citations?user=r9qb4ZwAAAAJ&hl=zh-CN)<sup>2\*</sup>, [Kun Wang]()<sup>3</sup>, [Linjiang Huang](https://leonhlj.github.io/)<sup>6</sup>, [Hao Li](https://scholar.google.com/citations?user=qHqQsY4AAAAJ&hl=zh-CN)<sup>1,4</sup>, [Shilin Yan](https://scholar.google.com/citations?user=2VhjOykAAAAJ&hl=zh-CN), [Hao Tian]()<sup>3</sup>, [Xingyu Zeng]()<sup>3</sup>, [Rui Zhao]()<sup>3</sup>, [Jifeng Dai](https://jifengdai.org/)<sup>4,5</sup>, [Xihui Liu](https://xh-liu.github.io/)<sup>2</sup>, [Hongsheng Li](https://www.ee.cuhk.edu.hk/~hsli/)<sup>1</sup>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
<sup>1</sup>CUHK MMLab, <sup>2</sup>HKU MMLab, <sup>3</sup>SenseTime, <sup>4</sup>Shanghai AI Laboratory, <sup>5</sup>Tsinghua University, <sup>6</sup>Beihang University
|
| 6 |
|
| 7 |
+
*Equal contribution
|
| 8 |
</div>
|
| 9 |
|
| 10 |
<div align="center" style="line-height: 1.2;">
|
|
|
|
| 106 |
|
| 107 |
</div>
|
| 108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
## Usage
|
| 110 |
|
| 111 |
### Dependencies
|