ZichengD commited on
Commit
e11274f
·
verified ·
1 Parent(s): aa4d759

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - Visual Question-Visual Answering (VQVA)
4
+ - dataset-construction
5
+ - image-editing
6
+ - multimodal
7
+ - instruction-tuning
8
+ - visual-reasoning
9
+ ---
10
+
11
+ # 🥯 **BAGEL-World-model**
12
+
13
+ **A agentic data-centric framework for producing large-scale interleaved Visual Question–Visual Answering (VQ-VA) data.**
14
+
15
+
16
+ ---
17
+
18
+ The BAGEL-World framework outputs high-quality VQ-VA data via the following steps:
19
+
20
+ ### 🔄**Preprocessing**
21
+
22
+ Filters and classify noisy web-interleaved data into design- and knowledge-related documents.
23
+
24
+
25
+ ### 🤖**Agentic Pipeline for VQ-VA Data Creation**
26
+
27
+ **1. Retriever** selects image pairs containing non-trivial transformations from interleaved documents that can serve as the basis for free-form questions.
28
+
29
+ **2. Instruction** Generator write a natural-language question about one image so that the other image serves as the correct answer.
30
+
31
+ **3. Filterer** removes low-quality triplets ⟨Question Image, Question Text, Answer Image⟩.
32
+
33
+ **4. Rewriter** increases instruction diversity by producing multiple variants of the original questions.
34
+
35
+ **5. Reasoner** generates a language-based chain-of-thought explanation describing how the source image should be transformed to obtain the target image.
36
+
37
+ The framework at last outputs **interleaved quadruplets**:
38
+
39
+ - 🧠 *Question Image*
40
+ - 💬 *Visual Question / Instruction*
41
+ - 🔍 *Reasoning Trace*
42
+ - 🎨 *Answer Image*
43
+
44
+
45
+
46
+
47
+
48
+ Stay tuned for updates and examples!