Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,48 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- Visual Question-Visual Answering (VQVA)
|
| 4 |
+
- dataset-construction
|
| 5 |
+
- image-editing
|
| 6 |
+
- multimodal
|
| 7 |
+
- instruction-tuning
|
| 8 |
+
- visual-reasoning
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# 🥯 **BAGEL-World-model**
|
| 12 |
+
|
| 13 |
+
**A agentic data-centric framework for producing large-scale interleaved Visual Question–Visual Answering (VQ-VA) data.**
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
The BAGEL-World framework outputs high-quality VQ-VA data via the following steps:
|
| 19 |
+
|
| 20 |
+
### 🔄**Preprocessing**
|
| 21 |
+
|
| 22 |
+
Filters and classify noisy web-interleaved data into design- and knowledge-related documents.
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
### 🤖**Agentic Pipeline for VQ-VA Data Creation**
|
| 26 |
+
|
| 27 |
+
**1. Retriever** selects image pairs containing non-trivial transformations from interleaved documents that can serve as the basis for free-form questions.
|
| 28 |
+
|
| 29 |
+
**2. Instruction** Generator write a natural-language question about one image so that the other image serves as the correct answer.
|
| 30 |
+
|
| 31 |
+
**3. Filterer** removes low-quality triplets ⟨Question Image, Question Text, Answer Image⟩.
|
| 32 |
+
|
| 33 |
+
**4. Rewriter** increases instruction diversity by producing multiple variants of the original questions.
|
| 34 |
+
|
| 35 |
+
**5. Reasoner** generates a language-based chain-of-thought explanation describing how the source image should be transformed to obtain the target image.
|
| 36 |
+
|
| 37 |
+
The framework at last outputs **interleaved quadruplets**:
|
| 38 |
+
|
| 39 |
+
- 🧠 *Question Image*
|
| 40 |
+
- 💬 *Visual Question / Instruction*
|
| 41 |
+
- 🔍 *Reasoning Trace*
|
| 42 |
+
- 🎨 *Answer Image*
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
Stay tuned for updates and examples!
|