niladridutt commited on
Commit
104d814
·
verified ·
1 Parent(s): fa1aaab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +336 -1
README.md CHANGED
@@ -6,4 +6,339 @@ language:
6
  - en
7
  base_model:
8
  - Qwen/Qwen2-VL-7B-Instruct
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - en
7
  base_model:
8
  - Qwen/Qwen2-VL-7B-Instruct
9
+ ---
10
+
11
+ ---
12
+ license: mit
13
+ ---
14
+
15
+
16
+ # MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills
17
+ ### **SIGGRAPH 2025 (ACM Transactions on Graphics)**
18
+
19
+ <div align="center">
20
+
21
+ [![Project Page](https://img.shields.io/badge/Project-Page-green)](https://monetgpt.github.io/)
22
+ [![Paper](https://img.shields.io/badge/Paper-ArXiv-red)](https://arxiv.org/abs/2505.06176)
23
+ [![ACM](https://img.shields.io/badge/ACM-PDF-blue)](https://dl.acm.org/doi/pdf/10.1145/3730926)
24
+ [![Model](https://img.shields.io/badge/🤗-Model-yellow)](https://huggingface.co/niladridutt/monetGPT)
25
+
26
+
27
+ </div>
28
+
29
+ <div align="center">
30
+ <img src="assets/teaser.jpg" alt="MonetGPT Teaser">
31
+ </div>
32
+
33
+ ## Table of Contents
34
+
35
+ - [Overview](#overview)
36
+ - [Quick Start](#-quick-start)
37
+ - [Usage](#-usage)
38
+ - [Training Your Own Model](#-training-your-own-model)
39
+ - [Image Processing CLI Usage](#-image-processing-cli-usage)
40
+ - [Puzzle Types](#-puzzle-types)
41
+ - [Configuration](#-configuration)
42
+ - [Results & Evaluation](#-results--evaluation)
43
+ - [Troubleshooting](#-troubleshooting)
44
+ - [Citation](#-citation)
45
+ - [License](#-license)
46
+
47
+ ### Note: This HuggingFace repository only contains model weights. The full codebase for MonetGPT is available on our [GitHub repository](https://github.com/niladridutt/monetGPT).
48
+
49
+ ## Overview
50
+
51
+ **MonetGPT** is a novel framework that teaches multimodal large language models (MLLMs) to perform professional-quality image retouching through procedural operations. Unlike generative editing approaches that can unpredictably alter image content, MonetGPT learns to plan and execute sequences of traditional retouching operations (brightness, contrast, saturation, etc.) that preserve object identity and provide explainable results.
52
+
53
+ ### Visual Puzzles for Operation Awareness
54
+
55
+ 🧩 MLLMs learn retouching operations by solving specially designed visual puzzles that teach operation recognition, parameter understanding, and sequence planning.Unlike black-box generative models, MonetGPT provides clear reasoning for each editing decision and preserves original image content and resolution (e.g., 8K 16-bit).
56
+
57
+ ## 🚀 Quick Start
58
+
59
+ ### Installation
60
+
61
+ ```bash
62
+ # Clone the repository
63
+ git clone https://github.com/monetgpt/monetgpt.git
64
+ cd monetgpt
65
+
66
+ # Create and activate conda environment
67
+ conda create -n monetgpt python=3.11
68
+ conda activate monetgpt
69
+
70
+ # Install dependencies
71
+ cd llm
72
+ sh install.sh
73
+ ```
74
+
75
+ ### GIMP Installation
76
+
77
+ MonetGPT requires **GIMP 2.10** for image processing operations. Other versions may not be compatible.
78
+
79
+ #### Download GIMP 2.10:
80
+ - **macOS**: [Download GIMP 2.10.38 ARM64](https://download.gimp.org/gimp/v2.10/macos/gimp-2.10.38-arm64-1.dmg)
81
+ - **Linux**: Install via Flatpak:
82
+ ```bash
83
+ flatpak install flathub org.gimp.GIMP//2.10
84
+ ```
85
+ - **Windows**: [Download from GIMP website](https://download.gimp.org/gimp/v2.10/)
86
+
87
+ #### Install NumPy for GIMP (Linux Flatpak only):
88
+ ```bash
89
+ flatpak run --command=sh org.gimp.GIMP//stable -c "python -m pip install --user numpy"
90
+ ```
91
+
92
+ Note: MacOS version ships with NumPy built-in.
93
+
94
+ ### Download Pre-trained Model
95
+
96
+ Download the trained MonetGPT model from Hugging Face:
97
+
98
+ ```bash
99
+ # Navigate to llm directory and create models folder
100
+ cd llm
101
+ mkdir -p models
102
+ cd models
103
+
104
+ # Download the model using HF CLI
105
+ huggingface-cli download niladridutt/monetGPT
106
+
107
+ # OR Clone the model repository (requires git lfs)
108
+ git clone https://huggingface.co/niladridutt/monetGPT
109
+ ```
110
+
111
+ > **Note**: Ensure the model is saved as `llm/models/monetGPT` to match the expected directory structure or otherwise modify the configs in llm.
112
+
113
+ ## � Usage
114
+
115
+ <div align="center">
116
+ <img src="assets/pipeline.gif" alt="MonetGPT Pipeline">
117
+ </div>
118
+
119
+ ## Inference
120
+
121
+ First start the LLM, which shall launch a server
122
+
123
+ ```bash
124
+ cd llm
125
+ sh test.sh
126
+ cd ..
127
+ ```
128
+
129
+ Run image enhancement/retouching with the pre-trained MonetGPT model (make sure the LLM is running):
130
+
131
+ ```bash
132
+ # Single image processing
133
+
134
+ python inference_cli.py single input.jpg --output results/edited.jpg
135
+
136
+ # Batch processing
137
+ python inference_cli.py batch assets/test --output-dir results/
138
+ ```
139
+
140
+ ## 📚 Training Your Own Model
141
+
142
+ ### 1. Dataset Preparation
143
+
144
+ See `image_sources` in `configs/dataset_config.yaml`
145
+
146
+ ```bash
147
+ # Prepare your training images
148
+ mkdir -p data/ppr10k
149
+ # Place your .png/.jpg images in data/images/
150
+ ```
151
+
152
+ ### 2. Generate Training Puzzles
153
+
154
+ ```bash
155
+ # Step 1: Generate puzzle configurations (operation parameters)
156
+ python dataset_cli.py generate
157
+
158
+ # Step 2: Create visual puzzle images
159
+ python pipeline_cli.py puzzle 1 # Single operation puzzles
160
+ python pipeline_cli.py puzzle 2 # Multi-version comparison puzzles
161
+ python pipeline_cli.py puzzle 3 # Comprehensive editing puzzles
162
+ python pipeline_cli.py puzzle all # Generate all puzzle types
163
+ ```
164
+
165
+ ### 3. Generate LLM Reasoning
166
+
167
+ ```bash
168
+ # Step 3: Query LLM to add reasoning to puzzle configs
169
+ python dataset_cli.py query 1 0 -1 # Generate reasoning for all puzzle 1 configs (all)
170
+ python dataset_cli.py query 2 0 -1 # Generate reasoning for puzzle 2 configs (all)
171
+ python dataset_cli.py query 3 0 -1 # Generate reasoning for puzzle 3 configs (all)
172
+
173
+ # Step 4: Create final ShareGPT format datasets
174
+ python dataset_cli.py create # Create datasets for all puzzles
175
+
176
+ # Step 5: Combine datasets for training
177
+ python dataset/combine_jsons.py # Combine JSON datasets and export for training
178
+ ```
179
+
180
+ ### 4. Train the Model
181
+
182
+ ```bash
183
+ # Train MonetGPT model on the generated datasets
184
+ cd llm
185
+ sh train.sh
186
+ ```
187
+
188
+ > **Note**: This project uses [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) for training infrastructure, which is licensed under [Apache 2.0](https://github.com/hiyouga/LLaMA-Factory/blob/main/LICENSE).
189
+
190
+ ## � Image Processing CLI Usage
191
+
192
+ ### Single Image Edit
193
+
194
+ ```bash
195
+ # Apply a specific retouching configuration to an image
196
+ python pipeline_cli.py edit configs/example_edit.json input.jpg output.jpg
197
+ ```
198
+
199
+ ### Batch Processing
200
+
201
+ ```bash
202
+ # Process multiple images with MonetGPT predictions
203
+ python pipeline_cli.py batch predictions --target-editor a
204
+ ```
205
+
206
+ ### Generate Puzzle Images
207
+
208
+ ```bash
209
+ # Generate single operation puzzles
210
+ python pipeline_cli.py puzzle 1
211
+
212
+ # Generate multi-version comparison puzzles
213
+ python pipeline_cli.py puzzle 2
214
+
215
+ # Generate comprehensive editing puzzles
216
+ python pipeline_cli.py puzzle 3
217
+
218
+ # Generate all puzzle types
219
+ python pipeline_cli.py puzzle all
220
+ ```
221
+
222
+
223
+
224
+ ## 🧩 Puzzle Types
225
+
226
+ ### Puzzle 1: Single Operation Analysis
227
+ - **Purpose**: Teach individual retouching operations
228
+ - **Format**: Before/after comparison with single operation
229
+ - **Example**: "Which adjustment was applied and how much?"
230
+
231
+ ### Puzzle 2: Multi-Version Comparison
232
+ - **Purpose**: Teach parameter value relationships
233
+ - **Format**: Multiple versions with different parameter values
234
+ - **Example**: "Rank these images by optimal saturation level"
235
+
236
+ ### Puzzle 3: Comprehensive Editing Plans
237
+ - **Purpose**: Teach complete retouching workflows
238
+ - **Format**: Multi-step editing sequences
239
+ - **Example**: "Plan the editing sequence: 1) Fix lighting 2) Adjust white balance 3) Enhance colors"
240
+
241
+ ## 🔧 Configuration
242
+
243
+ ### Dataset Configuration (`configs/dataset_config.yaml`)
244
+
245
+ ```yaml
246
+ # LLM settings for reasoning generation
247
+ model: "gemini-2.0-flash"
248
+ api_key: "" # Set your API key here
249
+ base_url: "https://generativelanguage.googleapis.com/v1beta/openai/"
250
+ timeout: 5
251
+ retry_attempts: 1
252
+
253
+ # Puzzle paths and settings
254
+ puzzles:
255
+ puzzle1:
256
+ reasoning_path: "./data/puzzles1/reasoning/*.txt"
257
+ images_path: "./data/puzzles1/images/*.png"
258
+ images_base_path: "./data/puzzles1/images"
259
+ output_file: "data/sharegpt_puzzle_1.json"
260
+
261
+ puzzle2:
262
+ reasoning_path: "./data/puzzles2/reasoning/*.txt"
263
+ images_path: "./data/puzzles2/images/*.png"
264
+ images_base_path: "./data/puzzles2/images"
265
+ output_file: "data/sharegpt_puzzle_2.json"
266
+
267
+ puzzle3:
268
+ reasoning_path: "./data/puzzles3/reasoning/*.txt"
269
+ images_path: "./data/puzzles3/images/*/*.tif"
270
+ images_base_path: "./data/puzzles3/images"
271
+ output_file: "data/sharegpt_puzzle_3.json"
272
+
273
+ # Generation settings
274
+ generation:
275
+ num_standard_trials: 2
276
+ num_color_trials: 1
277
+ num_puzzle3_trials: 10
278
+ ```
279
+
280
+ ### Pipeline Configuration (`configs/pipeline_config.yaml`)
281
+
282
+ ```yaml
283
+ # GIMP settings
284
+ gimp:
285
+ paths:
286
+ macos: "/Applications/GIMP.app/Contents/MacOS/gimp-console-2.10"
287
+ linux: "flatpak run org.gimp.GIMP//stable --no-interface"
288
+ windows: "gimp-console-2.10.exe" (Not tested, may require some modifications)
289
+
290
+ batch_interpreter: "python-fu-eval"
291
+ python_warnings: "ignore"
292
+ pipeline_file: "./gimp_pipeline.py"
293
+
294
+ # Image processing settings
295
+ image_processing:
296
+ max_low_res_size: 700 # Low resolution for LLM training only
297
+ default_dpi: 140 # Original resolution preserved during inference
298
+
299
+ # Processing parameters
300
+ processing:
301
+ batch_size: 10
302
+ max_workers: 4
303
+ timeout_seconds: 120
304
+ ```
305
+
306
+ ### Benchmark Performance
307
+ MonetGPT achieves state-of-the-art results on image retouching tasks while providing full explainability and maintaining original image resolution.
308
+
309
+ ## 🔧 Troubleshooting
310
+
311
+ ### Common Issues
312
+
313
+ **GIMP Not Found**: Ensure GIMP 2.10 is installed and the path in `configs/pipeline_config.yaml` matches your installation.
314
+
315
+ **NumPy Import Error**: Install NumPy in GIMP's Python environment (see GIMP installation section).
316
+
317
+ **Model Download Issues**: Verify Git LFS is installed for large model files: `git lfs install`
318
+
319
+
320
+ ## 📄 Citation
321
+
322
+ If you find MonetGPT useful in your research, please consider citing our paper:
323
+
324
+ ```bibtex
325
+ @article{dutt2025monetgpt,
326
+ title={MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills},
327
+ author={Dutt, Niladri Shekhar and Ceylan, Duygu and Mitra, Niloy J},
328
+ journal={ACM Transactions on Graphics (TOG)},
329
+ volume={44},
330
+ number={4},
331
+ pages={1--12},
332
+ year={2025},
333
+ publisher={ACM New York, NY, USA}
334
+ }
335
+ ```
336
+
337
+
338
+ ## 📜 License
339
+
340
+ This project is released under the [MIT License](LICENSE).
341
+
342
+ This project uses image dehazer as one of the image operations. This code is adapted from [Single-Image-Dehazing-Python](https://github.com/Utkarsh-Deshmukh/Single-Image-Dehazing-Python/tree/master), which is licensed under the BSD 2-Clause License. A copy of this license can be found in the [licenses/BSD-2-Clause.txt](licenses/BSD-2-Clause.txt) file.
343
+
344
+