chatpig commited on
Commit
cb7ddf9
·
verified ·
1 Parent(s): 569a73d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +545 -3
README.md CHANGED
@@ -1,3 +1,545 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # Pixel
5
+
6
+ A PyTorch-based Generative Adversarial Network (GAN) for training and generating CryptoPunk-style pixel art images.
7
+
8
+ ## Table of Contents
9
+
10
+ - [Overview](#overview)
11
+ - [Project Structure](#project-structure)
12
+ - [Architecture](#architecture)
13
+ - [Workflow](#workflow)
14
+ - [Installation](#installation)
15
+ - [User Guide](#user-guide)
16
+ - [Configuration](#configuration)
17
+ - [Examples](#examples)
18
+
19
+ ---
20
+
21
+ ## Overview
22
+
23
+ This project implements a Deep Convolutional GAN (DCGAN) to generate pixel art images. It includes:
24
+
25
+ - **Training Pipeline**: Train a GAN model on your custom image dataset
26
+ - **Image Generation**: Generate new images using trained models
27
+ - **Interactive GUI**: Tkinter-based interface for real-time generation
28
+ - **GGUF Support**: Convert and use GGUF quantized models
29
+
30
+ ---
31
+
32
+ ## Project Structure
33
+
34
+ ```
35
+ ai-picture-model-trainer/
36
+
37
+ ├── trainer.py # GAN training script
38
+ ├── generator.py # Image generation (GUI + CLI)
39
+
40
+ ├── data/ # Training data
41
+ │ ├── attributes.csv # Dataset metadata
42
+ │ └── images/ # Training images (punk000.png, punk001.png, ...)
43
+
44
+ ├── models/ # Trained model storage
45
+ │ └── generator_model.safetensors
46
+
47
+ ├── generated/ # Generated image outputs
48
+ │ ├── output.png # Grid visualizations
49
+ │ └── individual/ # Individual generated images
50
+
51
+ ├── gen_images/ # Training progress images
52
+ │ ├── epoch_0.png
53
+ │ ├── epoch_1.png
54
+ │ └── ...
55
+
56
+ └── gguf/ # GGUF format support
57
+ └── generator.py # GGUF model converter/generator
58
+ ```
59
+
60
+ ### Directory Purpose
61
+
62
+ | Directory | Purpose |
63
+ |-----------|---------|
64
+ | `data/` | Input training images and metadata |
65
+ | `models/` | Saved trained models (.safetensors format) |
66
+ | `generated/` | Output from generator.py |
67
+ | `gen_images/` | Training progress visualizations |
68
+ | `gguf/` | GGUF quantized model support |
69
+
70
+ ---
71
+
72
+ ## Architecture
73
+
74
+ ### GAN Components
75
+
76
+ ```
77
+ ┌─────────────────────────────────────────────────────────────┐
78
+ │ GAN Architecture │
79
+ └─────────────────────────────────────────────────────────────┘
80
+
81
+ ┌──────────────────┐ ┌──────────────────┐
82
+ │ Generator │ │ Discriminator │
83
+ │ │ │ │
84
+ │ Input: Noise │ │ Input: Images │
85
+ │ (100-dim) │ │ (24x24x4) │
86
+ │ │ │ │
87
+ │ ┌────────────┐ │ │ ┌────────────┐ │
88
+ │ │ FC │ │ │ │ Conv2D │ │
89
+ │ │ (9,216) │ │ │ │ 64 filters│ │
90
+ │ └────────────┘ │ │ └────────────┘ │
91
+ │ ↓ │ │ ↓ │
92
+ │ ┌────────────┐ │ │ ┌────────────┐ │
93
+ │ │ Reshape │ │ ┌──────────┐ │ │ Conv2D │ │
94
+ │ │ (256,6,6) │ │───→│ Real or │←───│ │ 128 filters│ │
95
+ │ └────────────┘ │ │ Fake? │ │ └────────────┘ │
96
+ │ ↓ │ └──────────┘ │ ↓ │
97
+ │ ┌────────────┐ │ │ ┌────────────┐ │
98
+ │ │ ConvTrans │ │ │ │ Conv2D │ │
99
+ │ │ 128 filters│ │ │ │ 256 filters│ │
100
+ │ └────────────┘ │ │ └────────────┘ │
101
+ │ ↓ │ │ ↓ │
102
+ │ ┌────────────┐ │ │ ┌────────────┐ │
103
+ │ │ ConvTrans │ │ │ │ GlobalAvg │ │
104
+ │ │ 64 filters│ │ │ │ Pool │ │
105
+ │ └────────────┘ │ │ └─────��──────┘ │
106
+ │ ↓ │ │ ↓ │
107
+ │ ┌────────────┐ │ │ ┌────────────┐ │
108
+ │ │ ConvTrans │ │ │ │ FC + Sig │ │
109
+ │ │ 4 channels│ │ │ │ (0-1) │ │
110
+ │ └────────────┘ │ │ └────────────┘ │
111
+ │ │ │ │
112
+ │ Output: Image │ │ Output: Score │
113
+ │ (24x24x4 RGBA) │ │ (Real/Fake) │
114
+ └──────────────────┘ └──────────────────┘
115
+ ```
116
+
117
+ ### Model Details
118
+
119
+ **Generator**:
120
+ - Input: 100-dimensional latent vector (random noise)
121
+ - Architecture: FC → BatchNorm → 3x ConvTranspose2D with BatchNorm
122
+ - Output: 24x24x4 RGBA image (values in [-1, 1])
123
+ - Activation: LeakyReLU + Tanh (output)
124
+
125
+ **Discriminator**:
126
+ - Input: 24x24x4 RGBA image
127
+ - Architecture: 3x Conv2D with Dropout → GlobalAvgPool → FC
128
+ - Output: Probability score [0, 1] (real vs. fake)
129
+ - Activation: LeakyReLU + Sigmoid (output)
130
+
131
+ ---
132
+
133
+ ## Workflow
134
+
135
+ ### Training Workflow
136
+
137
+ ```
138
+ ┌─────────────┐
139
+ │ Start │
140
+ └──────┬──────┘
141
+
142
+
143
+ ┌──────────────────────┐
144
+ │ Load Dataset │
145
+ │ (data/images/) │
146
+ └──────┬───────────────┘
147
+
148
+
149
+ ┌──────────────────────┐
150
+ │ Initialize Models │
151
+ │ - Generator │
152
+ │ - Discriminator │
153
+ └──────┬───────────────┘
154
+
155
+
156
+ ┌──────────────────────┐
157
+ │ Training Loop │◄─────────┐
158
+ │ (N epochs) │ │
159
+ └──────┬───────────────┘ │
160
+ │ │
161
+ ▼ │
162
+ ┌──────────────────────┐ │
163
+ │ For each batch: │ │
164
+ │ 1. Train Discrim. │ │
165
+ │ 2. Train Generator │ │
166
+ └──────┬───────────────┘ │
167
+ │ │
168
+ ▼ │
169
+ ┌──────────────────────┐ │
170
+ │ Save Progress │ │
171
+ │ (gen_images/) │──────────┘
172
+ └──────┬───────────────┘
173
+
174
+
175
+ ┌──────────────────────┐
176
+ │ Save Final Model │
177
+ │ (models/*.safetensors)
178
+ └──────┬───────────────┘
179
+
180
+
181
+ ┌──────────────┐
182
+ │ Complete │
183
+ └──────────────┘
184
+ ```
185
+
186
+ ### Generation Workflow
187
+
188
+ ```
189
+ ┌─────────────────────┐
190
+ │ Mode Selection │
191
+ └──────┬──────────────┘
192
+
193
+ ├─────────────────────────┐
194
+ │ │
195
+ ▼ ▼
196
+ ┌──────────────┐ ┌──────────────────┐
197
+ │ GUI Mode │ │ CLI Mode │
198
+ └──────┬───────┘ └──────┬───────────┘
199
+ │ │
200
+ ▼ ▼
201
+ ┌──────────────┐ ┌──────────────────┐
202
+ │ Load Model │ │ Load Model │
203
+ │ (safetensors)│ │ Parse Args │
204
+ └──────┬───────┘ └──────┬───────────┘
205
+ │ │
206
+ ▼ ▼
207
+ ┌──────────────┐ ┌──────────────────┐
208
+ │ Tkinter GUI │ │ Generate N Images│
209
+ │ - Button 1x1 │ │ - Custom grid │
210
+ │ - Button 3x3 │ │ - Custom seed │
211
+ │ - Button 5x5 │ │ - Save options │
212
+ └──────┬───────┘ └──────┬───────────┘
213
+ │ │
214
+ ▼ ▼
215
+ ┌─────────���────┐ ┌──────────────────┐
216
+ │ On Click: │ │ Save Grid │
217
+ │ Generate │ │ Save Individual │
218
+ │ Display │ │ (optional) │
219
+ └──────┬───────┘ └──────────────────┘
220
+
221
+
222
+ ┌──────────────┐
223
+ │ Interactive │
224
+ │ Generation │
225
+ └──────────────┘
226
+ ```
227
+
228
+ ---
229
+
230
+ ## Installation
231
+
232
+ ### Prerequisites
233
+
234
+ - Python 3.8+
235
+ - CUDA-compatible GPU (optional, for faster training)
236
+
237
+ ### Setup
238
+
239
+ 1. **Clone/Download the repository**
240
+
241
+ 2. **Install dependencies**:
242
+
243
+ ```bash
244
+ pip install torch torchvision
245
+ pip install numpy pandas matplotlib pillow
246
+ pip install safetensors
247
+ ```
248
+
249
+ 3. **Prepare your dataset**:
250
+
251
+ Place your training images in `data/images/` with filenames like `punk000.png`, `punk001.png`, etc.
252
+
253
+ Create `data/attributes.csv`:
254
+ ```csv
255
+ id
256
+ 0
257
+ 1
258
+ 2
259
+ ...
260
+ ```
261
+
262
+ ---
263
+
264
+ ## User Guide
265
+
266
+ ### 1. Training a Model
267
+
268
+ Train the GAN on your dataset:
269
+
270
+ ```bash
271
+ python trainer.py \
272
+ --data_path ./data/attributes.csv \
273
+ --images_path ./data/images/ \
274
+ --model_output_path ./models/ \
275
+ --images_output_path ./gen_images/ \
276
+ --epochs 50 \
277
+ --batch_size 16
278
+ ```
279
+
280
+ **Training Parameters**:
281
+
282
+ | Parameter | Default | Description |
283
+ |-----------|---------|-------------|
284
+ | `--data_path` | `./data/attributes.csv` | Path to dataset metadata |
285
+ | `--images_path` | `./data/images/` | Directory containing training images |
286
+ | `--model_output_path` | `./models/` | Where to save trained model |
287
+ | `--images_output_path` | `./gen_images/` | Save progress images during training |
288
+ | `--epochs` | `50` | Number of training epochs |
289
+ | `--batch_size` | `16` | Training batch size |
290
+ | `--codings_size` | `100` | Latent vector dimension |
291
+ | `--image_size` | `24` | Output image size (24x24) |
292
+ | `--image_channels` | `4` | Image channels (4=RGBA, 3=RGB) |
293
+
294
+ **Training Output**:
295
+ - Progress displayed: `Epoch X/Y - Gen Loss: X.XXXX, Disc Loss: X.XXXX`
296
+ - Progress images saved to `gen_images/epoch_N.png`
297
+ - Final model saved to `models/generator_model.safetensors`
298
+
299
+ ---
300
+
301
+ ### 2. Generating Images
302
+
303
+ #### Option A: Interactive GUI Mode (Default)
304
+
305
+ Launch the Tkinter GUI for real-time generation:
306
+
307
+ ```bash
308
+ python generator.py
309
+ ```
310
+
311
+ or explicitly:
312
+
313
+ ```bash
314
+ python generator.py --gui
315
+ ```
316
+
317
+ **GUI Controls**:
318
+ - **Generate 1 cryptopunk**: Single image
319
+ - **Generate 3x3 cryptopunks**: 3x3 grid (9 images)
320
+ - **Generate 5x5 cryptopunks**: 5x5 grid (25 images)
321
+ - **Terminate**: Close the application
322
+
323
+ ---
324
+
325
+ #### Option B: Command-Line Interface (CLI)
326
+
327
+ Batch generate images from terminal:
328
+
329
+ **Basic generation** (16 images):
330
+ ```bash
331
+ python generator.py --num_images 16 --output_path ./generated/output.png
332
+ ```
333
+
334
+ **Custom grid** (8x8 = 64 images):
335
+ ```bash
336
+ python generator.py --grid_size 8 --output_path ./generated/grid_8x8.png
337
+ ```
338
+
339
+ **Reproducible generation** (with seed):
340
+ ```bash
341
+ python generator.py --grid_size 4 --seed 42 --output_path ./generated/seed42.png
342
+ ```
343
+
344
+ **Save individual images**:
345
+ ```bash
346
+ python generator.py \
347
+ --num_images 100 \
348
+ --save_individual \
349
+ --individual_output_dir ./generated/individual/ \
350
+ --output_path ./generated/batch.png
351
+ ```
352
+
353
+ **CLI Parameters**:
354
+
355
+ | Parameter | Default | Description |
356
+ |-----------|---------|-------------|
357
+ | `--model_path` | `./models/generator_model.safetensors` | Path to trained model |
358
+ | `--output_path` | `./generated/output.png` | Output path for grid image |
359
+ | `--num_images` | `16` | Number of images to generate |
360
+ | `--grid_size` | `None` | Grid size N for NxN layout |
361
+ | `--seed` | `None` | Random seed for reproducibility |
362
+ | `--save_individual` | `False` | Save each image separately |
363
+ | `--individual_output_dir` | `./generated/individual/` | Directory for individual images |
364
+
365
+ ---
366
+
367
+ ### 3. GGUF Model Support
368
+
369
+ Use quantized GGUF models for smaller file sizes:
370
+
371
+ ```bash
372
+ cd gguf/
373
+ python generator.py
374
+ ```
375
+
376
+ The GGUF generator will:
377
+ 1. Detect available `.gguf` files in the directory
378
+ 2. Prompt you to select a model
379
+ 3. Convert GGUF → SafeTensors format
380
+ 4. Launch the standard generator
381
+
382
+ ---
383
+
384
+ ## Configuration
385
+
386
+ ### Model Architecture Configuration
387
+
388
+ Modify these parameters in both `trainer.py` and `generator.py`:
389
+
390
+ ```python
391
+ --codings_size 100 # Latent vector dimension
392
+ --image_size 24 # Output image size
393
+ --image_channels 4 # RGBA (4) or RGB (3)
394
+ ```
395
+
396
+ ### Training Hyperparameters
397
+
398
+ In `trainer.py`:
399
+
400
+ ```python
401
+ # Optimizer
402
+ gen_optimizer = optim.RMSprop(generator.parameters(), lr=0.001)
403
+ disc_optimizer = optim.RMSprop(discriminator.parameters(), lr=0.001)
404
+
405
+ # Loss function
406
+ criterion = nn.BCELoss()
407
+
408
+ # Dropout rate (in Discriminator)
409
+ nn.Dropout(0.4)
410
+ ```
411
+
412
+ ### Data Preprocessing
413
+
414
+ In `trainer.py` → `ImageDataset`:
415
+
416
+ ```python
417
+ transforms.Compose([
418
+ transforms.Resize((image_size, image_size)),
419
+ transforms.ToTensor(),
420
+ transforms.Normalize([0.5] * channels, [0.5] * channels) # [-1, 1]
421
+ ])
422
+ ```
423
+
424
+ ---
425
+
426
+ ## Examples
427
+
428
+ ### Example 1: Train on Custom Dataset
429
+
430
+ ```bash
431
+ # Prepare your data
432
+ # data/images/punk000.png, punk001.png, ..., punk099.png
433
+ # data/attributes.csv with ids 0-99
434
+
435
+ # Train for 100 epochs
436
+ python trainer.py \
437
+ --data_path ./data/attributes.csv \
438
+ --images_path ./data/images/ \
439
+ --epochs 100 \
440
+ --batch_size 32 \
441
+ --model_output_path ./models/my_model.safetensors
442
+ ```
443
+
444
+ ### Example 2: Generate with Specific Seed
445
+
446
+ ```bash
447
+ # Generate same images every time
448
+ python generator.py \
449
+ --model_path ./models/generator_model.safetensors \
450
+ --grid_size 5 \
451
+ --seed 12345 \
452
+ --output_path ./results/reproducible.png
453
+ ```
454
+
455
+ ### Example 3: Batch Generation
456
+
457
+ ```bash
458
+ # Generate 1000 individual images
459
+ python generator.py \
460
+ --num_images 1000 \
461
+ --save_individual \
462
+ --individual_output_dir ./dataset_synthetic/ \
463
+ --output_path ./dataset_synthetic/overview.png
464
+ ```
465
+
466
+ ### Example 4: Monitor Training Progress
467
+
468
+ ```bash
469
+ # Training with progress visualization
470
+ python trainer.py \
471
+ --epochs 200 \
472
+ --images_output_path ./training_progress/
473
+
474
+ # View progress images
475
+ ls ./training_progress/
476
+ # epoch_0.png, epoch_1.png, ..., epoch_199.png
477
+ ```
478
+
479
+ ---
480
+
481
+ ## Technical Details
482
+
483
+ ### Model File Format
484
+
485
+ Models are saved in **SafeTensors** format (`.safetensors`) with embedded metadata:
486
+
487
+ ```python
488
+ metadata = {
489
+ 'codings_size': '100',
490
+ 'image_size': '24',
491
+ 'image_channels': '4'
492
+ }
493
+ ```
494
+
495
+ This ensures the generator automatically loads the correct architecture.
496
+
497
+ ### Image Value Ranges
498
+
499
+ - **Training**: Images normalized to [-1, 1]
500
+ - **Generation output**: Images scaled to [0, 1]
501
+ - **Saved files**: Images saved as uint8 [0, 255]
502
+
503
+ ### GPU Support
504
+
505
+ The code automatically detects and uses CUDA if available:
506
+
507
+ ```python
508
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
509
+ ```
510
+
511
+ ---
512
+
513
+ ## Troubleshooting
514
+
515
+ **Q: Training loss not decreasing?**
516
+ - Try adjusting learning rates
517
+ - Increase batch size or epochs
518
+ - Check if dataset has sufficient variety
519
+
520
+ **Q: Generated images look like noise?**
521
+ - Model needs more training epochs
522
+ - Dataset may be too small (need 50+ images minimum)
523
+ - Try adjusting discriminator dropout rate
524
+
525
+ **Q: GUI not launching?**
526
+ - Check Tkinter installation: `python -m tkinter`
527
+ - On Linux: `sudo apt-get install python3-tk`
528
+
529
+ **Q: CUDA out of memory?**
530
+ - Reduce batch size: `--batch_size 8`
531
+ - Reduce image size: `--image_size 16`
532
+
533
+ ---
534
+
535
+ ## License
536
+
537
+ This project is provided as-is for educational and creative purposes.
538
+
539
+ ---
540
+
541
+ ## Acknowledgments
542
+
543
+ - Built with PyTorch
544
+ - Inspired by DCGAN architecture
545
+ - Uses SafeTensors for model serialization