yhx12 nielsr HF Staff commited on
Commit
3821124
·
verified ·
1 Parent(s): 928f18d

Improve model card for DiffThinker: Add metadata, links, and usage details (#1)

Browse files

- Improve model card for DiffThinker: Add metadata, links, and usage details (938263e73b8c21b82d1875067db6287304281cd5)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +45 -7
README.md CHANGED
@@ -1,17 +1,45 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - Qwen/Qwen-Image-Edit-2509
 
 
 
 
 
7
  ---
 
8
  # DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
 
9
  <a href="https://diffthinker-project.github.io/"><img src="https://img.shields.io/badge/%F0%9F%8C%90%20Project-Page-2563eb" alt="Project Page"></a>
10
- <br>
11
- This model is based on the paper: https://arxiv.org/abs/2512.24165
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ### Inference & Evaluation
13
- The test datasets used in our experiments is provided within each task's directory. We recommend using the same data to ensure the reproducibility of our results and to facilitate comparison with other models. If you wish to generate your own test data, please refer to the ```gen.txt``` file in each task directory.
14
- ```code
 
15
  cd Maze
16
 
17
  # 1. Inference and Parsing
@@ -23,4 +51,14 @@ bash eval/eval_path.sh
23
  # 3. Individual Inference
24
  python ../DiffSynth-Studio/add/infer/infer.py
25
  python ../DiffSynth-Studio/add/infer/infer_with_middle.py
 
 
 
 
 
 
 
 
 
 
26
  ```
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen-Image-Edit-2509
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ library_name: diffusers
8
+ pipeline_tag: image-to-image
9
  ---
10
+
11
  # DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
12
+
13
  <a href="https://diffthinker-project.github.io/"><img src="https://img.shields.io/badge/%F0%9F%8C%90%20Project-Page-2563eb" alt="Project Page"></a>
14
+ <a href="https://github.com/lcqysl/DiffThinker"><img src="https://img.shields.io/badge/GitHub-Code-blue?logo=github" alt="GitHub"></a>
15
+ <a href="https://huggingface.co/papers/2512.24165"><img src="https://img.shields.io/badge/arXiv-Paper-b31b1b" alt="Paper"></a>
16
+
17
+ DiffThinker introduces a novel Generative Multimodal Reasoning paradigm, establishing a diffusion-based reasoning framework. It reformulates multimodal reasoning as a native generative image-to-image task, achieving superior logical consistency and spatial precision in vision-centric tasks compared to traditional text-centric Multimodal Large Language Models (MLLMs).
18
+
19
+ ### Features
20
+ DiffThinker exhibits four core properties in its approach to vision-centric reasoning:
21
+ - **Efficiency**: Streamlined reasoning process.
22
+ - **Controllability**: Precise spatial and logical generation.
23
+ - **Native Parallelism**: Advantageous for complex reasoning steps.
24
+ - **Collaboration**: Works effectively across multiple domains (sequential planning, combinatorial optimization, constraint satisfaction, and spatial configuration).
25
+
26
+ ### Quick Start
27
+ To get started with DiffThinker, clone the official repository and install the necessary dependencies:
28
+ ```bash
29
+ git clone https://github.com/lcqysl/DiffThinker.git
30
+ cd DiffThinker/DiffSynth-Studio
31
+ pip install -e .
32
+ pip install gymnasium
33
+
34
+ # (Optional) Install vLLM for OCR tasks
35
+ # we recommend installing it in a SEPARATE environment to avoid conflicts.
36
+ # pip install vllm
37
+ ```
38
+
39
  ### Inference & Evaluation
40
+ The test datasets used in our experiments are provided within each task's directory. We recommend using the same data to ensure the reproducibility of our results and to facilitate comparison with other models. If you wish to generate your own test data, please refer to the `gen.txt` file in each task directory.
41
+
42
+ ```bash
43
  cd Maze
44
 
45
  # 1. Inference and Parsing
 
51
  # 3. Individual Inference
52
  python ../DiffSynth-Studio/add/infer/infer.py
53
  python ../DiffSynth-Studio/add/infer/infer_with_middle.py
54
+ ```
55
+
56
+ ### Citation
57
+ ```bibtex
58
+ @article{he2024diffthinker,
59
+ title={DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models},
60
+ author={He, Zefeng and Qu, Xiaoye and Li, Yafu and Zhu, Tong and Huang, Siyuan and Cheng, Yu},
61
+ journal={arXiv preprint arXiv:2512.24165},
62
+ year={2024}
63
+ }
64
  ```