Add pipeline tag and library name

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +215 -3
README.md CHANGED
@@ -1,3 +1,215 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: unconditional-image-generation
4
+ library_name: pytorch
5
+ ---
6
+
7
+ ## 🌟 Halton Scheduler for Masked Generative Image Transformer 🌟
8
+
9
+ [![GitHub stars](https://img.shields.io/github/stars/valeoai/Halton-MaskGIT.svg?style=social)](https://github.com/valeoai/Halton-MaskGIT/stargazers)
10
+ [![Hugging Face Model](https://img.shields.io/badge/Hugging%20Face-Model%20Card-orange?logo=huggingface)](https://huggingface.co/llvictorll/Halton-MaskGIT/tree/main)
11
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valeoai/Halton-Maskgit/blob/main/colab_demo.ipynb)
12
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.txt)
13
+ [![Paper](https://img.shields.io/badge/ICLR-2025-blue)](https://openreview.net/forum?id=RDVrlWAb7K) <img src="statics/its_just_a_frog_cie.png" alt="drawing" width="25"/>
14
+
15
+ Official PyTorch implementation of the paper:
16
+ **Halton Scheduler for Masked Generative Image Transformer**
17
+ *Victor Besnier, Mickael Chen, David Hurych, Eduardo Valle, Matthieu Cord*
18
+ Accepted at **ICLR 2025**.
19
+
20
+ TL;DR: We introduce a new sampling strategy using the Halton Scheduler, which spreads tokens uniformly across the image.
21
+ This approach reduces sampling errors, and improves image quality.
22
+
23
+ ---
24
+
25
+ ## πŸš€ Overview
26
+
27
+ Welcome to the official implementation of our ICLR 2025 paper! πŸŽ‰
28
+
29
+ This repository introduces **Halton Scheduler for Masked Generative Image Transformer (MaskGIT)** and includes:
30
+ 1. **Class-to-Image Model**: Generates high-quality 384x384 images from ImageNet class labels.
31
+
32
+ <p align="center">
33
+ <img src="statics/cls2img_halton.png" width="100%" alt="Cls2Img">
34
+ </p>
35
+
36
+ 2. **Text-to-Image Model**: Generates realistic images from textual descriptions (coming soon)
37
+ <p align="center">
38
+ <img src="statics/txt2img_halton.jpg" width="100%" alt="Txt2Img">
39
+ </p>
40
+
41
+ Explore, train, and extend our easy to use generative models! πŸš€
42
+
43
+ The v1.0 version, previously known as "MaskGIT-pytorch" is available [here!](https://github.com/valeoai/Halton-MaskGIT/tree/v1.0)
44
+
45
+ ---
46
+
47
+ ## πŸ“ Repository Structure
48
+
49
+ ```plaintext
50
+ β”œ Halton-MaskGIT/
51
+ | β”œβ”€β”€ Congig/ <- Base config file for the demo
52
+ | | β”œβ”€β”€ base_cls2img.yaml
53
+ | | └── base_txt2img.yaml
54
+ | β”œβ”€β”€ Dataset/ <- Data loading utilities
55
+ | | β”œβ”€β”€ dataset.py <- PyTorch dataset class
56
+ | | └── dataloader.py <- PyTorch dataloader
57
+ | β”œβ”€β”€ launch/
58
+ | | β”œβ”€β”€ run_cls_to_img.sh <- Training script for class-to-image
59
+ | | └── run_txt_to_img.sh <- Training script for text-to-image (coming soon)
60
+ | β”œβ”€β”€ Metrics/
61
+ | | β”œβ”€β”€ extract_train_fid.py <- Precompute FID stats for ImageNet
62
+ | | β”œβ”€β”€ inception_metrics.py <- Inception score and FID evaluation
63
+ | | └── sample_and_eval.py <- Sampling and evaluation
64
+ | β”œβ”€β”€ Network/
65
+ | | β”œβ”€β”€ ema.py <- EMA model
66
+ | | β”œβ”€β”€ transformer.py <- Transformer for class-to-image
67
+ | | β”œβ”€β”€ txt_transformer.py <- Transformer for text-to-image (coming soon)
68
+ | | └── va_model.py <- VQGAN architecture
69
+ | β”œβ”€β”€ Sampler/
70
+ | | β”œβ”€β”€ confidence_sampler.py <- Confidence scheduler
71
+ | | β”œβ”€β”€ halton_sampler.py <- Halton scheduler
72
+ | β”œβ”€β”€ Trainer/ <- Training classes
73
+ | | β”œβ”€β”€ abstract_trainer.py <- Abstract trainer
74
+ | | β”œβ”€β”€ cls_trainer.py <- Class-to-image trainer
75
+ | | └── txt_trainer.py <- Text-to-image trainer (coming soon)
76
+ | β”œβ”€β”€ statics/ <- Sample images and assets
77
+ | β”œβ”€β”€ saved_networks/ <- placeholder for the downloaded models
78
+ | β”œβ”€β”€ colab_demo.ipynb <- Inference demo
79
+ | β”œβ”€β”€ app.py <- Gradio example
80
+ | β”œβ”€β”€ LICENSE.txt <- MIT license
81
+ | β”œβ”€β”€ env.yaml <- Environment setup file
82
+ | β”œβ”€β”€ README.md <- This file! πŸ“–
83
+ | └── main.py <- Main script
84
+ ```
85
+
86
+ ## πŸ› οΈ Usage
87
+ Get started with just a few steps:
88
+
89
+ ### 1️⃣ Clone the repository
90
+
91
+ ```bash
92
+ git clone https://github.com/valeoai/Halton-MaskGIT.git
93
+ cd Halton-MaskGIT
94
+ ```
95
+
96
+ ### 2️⃣ Install dependencies
97
+
98
+ ```bash
99
+ conda env create -f env.yaml
100
+ conda activate maskgit
101
+ ```
102
+
103
+ ### 3️⃣ Download pretrained models
104
+
105
+ ```python
106
+ from huggingface_hub import hf_hub_download
107
+ # The VQ-GAN
108
+ hf_hub_download(repo_id="FoundationVision/LlamaGen",
109
+ filename="vq_ds16_c2i.pt",
110
+ local_dir="./saved_networks/")
111
+
112
+ # (Optional) The MaskGIT
113
+ hf_hub_download(repo_id="llvictorll/Halton-Maskgit",
114
+ filename="ImageNet_384_large.pth",
115
+ local_dir="./saved_networks/")
116
+ ```
117
+
118
+ ### 4️⃣ Extract the code from the VQGAN
119
+
120
+ ```bash
121
+ python extract_vq_features.py --data_folder="/path/to/ImageNet/" --dest_folder="/your/path/" --bsize=256 --compile
122
+ ```
123
+
124
+ ### 5️⃣ Train the model
125
+
126
+ To train the class-to-image model:
127
+ ```bash
128
+ bash launch/run_cls_to_img.sh
129
+ ```
130
+
131
+
132
+ ## πŸ“Ÿ Quick Start for sampling
133
+ To quickly verify the functionality of our model, you can try this Python code:
134
+
135
+ ```python
136
+ import torch
137
+ from Utils.utils import load_args_from_file
138
+ from Utils.viz import show_images_grid
139
+ from huggingface_hub import hf_hub_download
140
+
141
+ from Trainer.cls_trainer import MaskGIT
142
+ from Sampler.halton_sampler import HaltonSampler
143
+
144
+ config_path = "Config/base_cls2img.yaml" # Path to your config file
145
+ args = load_args_from_file(config_path)
146
+ args.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
147
+
148
+ # Download the VQGAN from LlamaGen
149
+ hf_hub_download(repo_id="FoundationVision/LlamaGen",
150
+ filename="vq_ds16_c2i.pt",
151
+ local_dir="./saved_networks/")
152
+
153
+ # Download the MaskGIT
154
+ hf_hub_download(repo_id="llvictorll/Halton-Maskgit",
155
+ filename="ImageNet_384_large.pth",
156
+ local_dir="./saved_networks/")
157
+
158
+ # Initialisation of the model
159
+ model = MaskGIT(args)
160
+
161
+ # select your scheduler
162
+ sampler = HaltonSampler(sm_temp_min=1, sm_temp_max=1.2, temp_pow=1, temp_warmup=0, w=2,
163
+ sched_pow=2, step=32, randomize=True, top_k=-1)
164
+
165
+ # [goldfish, chicken, tiger cat, hourglass, ship, dog, race car, airliner]
166
+ labels = [1, 7, 282, 604, 724, 179, 751, 404]
167
+
168
+ gen_images = sampler(trainer=model, nb_sample=8, labels=labels, verbose=True)[0]
169
+ show_images_grid(gen_images)
170
+ ```
171
+ or run the gradio πŸ–ΌοΈ app.py --> ```python app.py ``` and connect to http://127.0.0.1:6006 on your navigator
172
+
173
+ 🎨 Want to try the model, but you don't have a gpu? Check out the Colab Notebook for an easy-to-run demo!
174
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/valeoai/Halton-Maskgit/blob/main/colab_demo.ipynb)
175
+
176
+ ## 🧠 Pretrained Models
177
+ The pretrained MaskGIT models are available on [Hugging Face](https://huggingface.co/llvictorll/Halton-MaskGIT/tree/main).
178
+ Use them to jump straight into inference or fine-tuning.
179
+
180
+ | Model | # Params | # Input | # GFLOP | VQGAN | MaskGIT |
181
+ |----------------------|----------|---------|---------|--------|-------------------------------------------------------------------|
182
+ | Halton-MaskGIT-Large | 480M | 24x24 | 83.00 | [πŸ”— Download](https://huggingface.co/FoundationVision/LlamaGen/blob/main/vq_ds16_c2i.pt) | [πŸ”— Download](https://huggingface.co/llvictorll/Halton-MaskGIT/blob/main/ImageNet_384_large.pth) |
183
+
184
+ ## ❀️ Contribute
185
+ We welcome contributions and feedback! πŸ› οΈ
186
+ If you encounter any issues, have suggestions, or want to collaborate, feel free to:
187
+ - Create an issue
188
+ - Fork the repository and submit a pull request
189
+
190
+ Your input is highly valued. Let’s make this project even better together! πŸ™Œ
191
+
192
+ ## πŸ“œ License
193
+ This project is licensed under the MIT License.
194
+ See the [LICENSE](LICENSE.txt) file for details.
195
+
196
+ ## πŸ™ Acknowledgments
197
+ We are grateful for the support of the IT4I Karolina Cluster in the Czech Republic for powering our experiments.
198
+
199
+ The pretrained VQGAN ImageNet (f=16/8, 16384 codebook) is from the [LlamaGen official repository](https://github.com/FoundationVision/LlamaGen?tab=readme-ov-file)
200
+
201
+ ## πŸ“– Citation
202
+ If you find our work useful, please cite us and add a star ⭐ to the repository :)
203
+
204
+ ```
205
+ @inproceedings{besnier2025iclr,
206
+ title={Halton Scheduler for Masked Generative Image Transformer},
207
+ author={Victor Besnier, Mickael Chen, David Hurych, Eduardo Valle, Matthieu Cord},
208
+ booktitle={International Conference on Learning Representations (ICLR)},
209
+ year={2025}
210
+ }
211
+ ```
212
+
213
+ ## ⭐ Stars History
214
+
215
+ [![Star History Chart](https://api.star-history.com/svg?repos=valeoai/Halton-MaskGIT&type=Date)](https://star-history.com/#valeoai/Halton-MaskGIT&Date)