SanskarModi commited on
Commit
d321919
·
1 Parent(s): 9bc957e

updated readme

Browse files
Files changed (4) hide show
  1. .gradio/certificate.pem +0 -31
  2. README.md +165 -146
  3. pyproject.toml +4 -8
  4. setup.cfg +7 -0
.gradio/certificate.pem DELETED
@@ -1,31 +0,0 @@
1
- -----BEGIN CERTIFICATE-----
2
- MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
3
- TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
4
- cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
5
- WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
6
- ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
7
- MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
8
- h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
9
- 0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
10
- A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
11
- T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
12
- B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
13
- B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
14
- KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
15
- OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
16
- jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
17
- qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
18
- rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
19
- HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
20
- hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
21
- ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
22
- 3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
23
- NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
24
- ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
25
- TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
26
- jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
27
- oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
28
- 4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
29
- mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
30
- emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
31
- -----END CERTIFICATE-----
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,174 +1,153 @@
1
- ---
2
- title: stable-diffusion-image-generator
3
- app_file: src/sdgen/main.py
4
- sdk: gradio
5
- sdk_version: 3.50.2
6
- ---
7
- # 🎨 Stable Diffusion Image Generator
8
 
9
- AI system built using **Stable Diffusion (HuggingFace Diffusers)** and a modern **Gradio UI**.
10
- This project generates high-quality images from text prompts and includes advanced capabilities such as:
11
 
12
- * Style presets
13
- * Image-to-Image generation
14
- * Super-resolution upscaling (RealESRGAN)
15
- * Prompt history & metadata tracking
16
- * Seed reproducibility
17
- * LoRA extension support
18
 
19
  ---
20
 
21
- # Feature Details
22
 
23
- ## 1️⃣ **Text-to-Image Generation**
24
 
25
- * Supports prompts & negative prompts
26
- * Adjustable steps, CFG scale, resolution
27
- * Seed for reproducibility
28
- * Preset selection panel
29
 
30
- ## 2️⃣ **Image-to-Image (Img2Img)**
31
 
32
- Transform uploaded images using prompts, e.g.:
 
 
 
33
 
34
- * “Make this photo look cyberpunk”
35
- * “Convert this portrait into anime style”
36
- * “Turn into oil painting style”
37
 
38
- ## 3️⃣ **Super-Resolution Upscaling**
 
 
 
39
 
40
- Improve output quality significantly:
41
 
42
- * 1.5×
43
- *
44
- *
45
- Powered by **RealESRGAN**.
46
 
47
- ## 4️⃣ **Style Presets**
48
 
49
- One-click artistic styles:
 
 
 
50
 
51
- * Anime
52
- * Realistic photography
53
- * Pixar / 3D
54
- * Oil painting
55
- * Cyberpunk neon
56
 
57
- ## 5️⃣ **Prompt History & Metadata Tracking**
58
 
59
- Every generation stores:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
- * Prompt
62
- * Negative prompt
63
- * Configuration
64
- * Seed
65
- * Generated image
66
 
67
- ## 6️⃣ **LoRA Support**
68
 
69
- Load and use custom LoRA fine-tuned models:
70
 
71
- * Styles
72
- * Artists
73
- * Characters
74
- * Themes
75
 
76
- ---
 
 
 
 
77
 
78
- # 🧩 Project Architecture
79
 
80
- ```
81
- stable-diffusion-image-generator/
82
-
83
- ├── app/
84
- │ ├── core/
85
- │ │ └── __init__.py
86
- │ │
87
- │ ├── pipeline.py
88
- │ │ # Loads & initializes Stable Diffusion (FP16, GPU, model configs)
89
- │ │
90
- │ ├── generator.py
91
- │ │ # Text-to-image inference logic
92
- │ │
93
- │ ├── img2img.py
94
- │ │ # Image-to-image transformation logic
95
- │ │
96
- │ ├── ui.py
97
- │ │ # Complete Gradio interface with multiple tabs:
98
- │ │ # Text2Img, Img2Img, Upscaling, History, About
99
- │ │
100
- │ ├── presets/
101
- │ │ ├── styles.py
102
- │ │ # Predefined artistic style presets (anime, cyberpunk, etc.)
103
- │ │
104
- │ ├── upscaler/
105
- │ │ ├── realesrgan.py
106
- │ │ # Super-resolution (1.5x, 2x, 4x)
107
- │ │
108
- │ ├── utils/
109
- │ │ ├── history.py # Prompt history & metadata saving
110
- │ │ ├── seed.py # Seed utilities for reproducibility
111
- │ │ ├── logger.py # Central logging
112
- │ │
113
- │ ├── models/
114
- │ │ ├── metadata.py # Data model for storing history entries
115
-
116
- ├── assets/
117
- │ ├── samples/ # Example generated images
118
- │ ├── lora/ # Custom LoRA models (optional)
119
-
120
- ├── main.py # Entry point (launches Gradio app)
121
- ├── requirements.txt # All dependencies (pinned)
122
- ├── LICENSE
123
- └── README.md
124
- ```
125
 
126
  ---
127
 
128
- # ⚙️ Installation & Setup
129
 
130
- ### Step 1 Clone the Repo
131
 
132
- ```
133
  git clone https://github.com/sanskarmodi8/stable-diffusion-image-generator
134
  cd stable-diffusion-image-generator
135
  ```
136
 
137
- ### Step 2 — Create virtual environment
138
 
139
- ```
140
- python -m venv venv
141
- source venv/bin/activate # Linux/Mac
142
- venv\Scripts\activate # Windows
143
  ```
144
 
145
- ### Step 3 Install PyTorch (GPU)
146
 
147
- ```
 
 
148
  pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
149
  ```
150
 
151
- ### Step 4 — Install remaining dependencies
152
 
153
- ```
154
  pip install -r requirements.txt
155
  ```
156
 
157
- ### Optional Login to HuggingFace
158
 
159
- ```
160
  huggingface-cli login
161
  ```
162
 
163
  ---
164
 
165
- # ▶️ Running the App
166
 
167
- ```
168
- python main.py
169
  ```
170
 
171
- App will run at:
172
 
173
  ```
174
  http://127.0.0.1:7860
@@ -176,57 +155,97 @@ http://127.0.0.1:7860
176
 
177
  ---
178
 
179
- # 🤝 Contributing
180
 
181
- This project follows **strict formatting and linting standards** to ensure clean, readable, and professional-quality code.
182
 
 
183
 
184
- #### 1. Install pre-commit hooks
185
 
186
- This ensures formatting and linting run **automatically** before every commit.
 
 
 
187
 
188
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
189
  pre-commit install
190
  ```
191
 
192
- #### 2. Format code manually (optional)
193
 
194
- ```
195
- black .
196
- isort .
197
  ruff check .
 
198
  ```
199
 
200
- #### 3. Create feature branches
201
-
202
- Follow standard naming:
203
 
204
  ```
205
- feature/<feature-name>
206
- fix/<bug-name>
207
  refactor/<module>
208
  ```
209
 
210
- #### 4. Commit messages
211
 
212
- Use clear, conventional messages:
213
 
214
- ```
215
- feat: add anime preset
216
- fix: resolve img2img prompt issue
217
- refactor: improve pipeline loading speed
218
- docs: update readme
219
- ```
220
 
221
  ---
222
 
223
- # 📄 License
224
-
225
- Released under the [**MIT License**](LICENSE).
226
 
227
- ---
228
 
229
- # Author
 
230
 
231
- **[Sanskar Modi](https://github.com/sanskarmodi8)**
232
- AI Developer & Machine Learning Engineer
 
1
+ # Stable Diffusion Image Generator
 
 
 
 
 
 
2
 
3
+ A modular image generation system built on **HuggingFace Diffusers**, with support for multiple Stable Diffusion pipelines, configurable inference parameters, a clean **Gradio UI**, and a lightweight local **history/metadata store**.
 
4
 
5
+ The system supports **text-to-image**, **image-to-image**, and **super-resolution upscaling** using **Real-ESRGAN (NCNN)**.
6
+ Designed with a focus on **extensibility**, **clean code**, and **practical deployment constraints** (CPU or low-memory environments).
 
 
 
 
7
 
8
  ---
9
 
10
+ # Core Features
11
 
12
+ ## Text-to-Image Generation
13
 
14
+ * Stable Diffusion pipelines (SD 1.5, Turbo)
15
+ * Adjustable **CFG scale**, **inference steps**, resolution, and seed
16
+ * Structured metadata (JSON) for reproducibility
17
+ * Style presets with recommended parameters
18
 
19
+ ## Image-to-Image (Img2Img)
20
 
21
+ * Pipeline reuse to avoid model reload cost
22
+ * Alpha-preserving prompt transforms
23
+ * Configurable denoising strength
24
+ * Deterministic or stochastic sampling
25
 
26
+ ## Upscaling (Real-ESRGAN NCNN)
 
 
27
 
28
+ * Lightweight **NCNN backend** (GPU not required)
29
+ * Supports 2× and 4× scaling
30
+ * Optional SD-upscaler backend planned
31
+ * Minimal dependencies, fast on CPU
32
 
33
+ ## Prompt History & Metadata Tracking
34
 
35
+ * Local metadata index with atomic writes
36
+ * Thumbnail + full-size image storage
37
+ * JSON schema for portability
38
+ * History browser UI
39
 
40
+ ## Multi-Model Runtime Switching
41
 
42
+ * Multiple pipelines loaded once
43
+ * Selection at inference without reload
44
+ * Shared tokenizer/encoder where possible
45
+ * Warm-up logic for fast Turbo inference
46
 
47
+ ---
 
 
 
 
48
 
49
+ # Architecture Overview
50
 
51
+ ```
52
+ src/sdgen/
53
+
54
+ ├── sd/
55
+ │ ├── pipeline.py # pipeline loader, warmup, dtype/device logic
56
+ │ ├── generator.py # text-to-image
57
+ │ ├── img2img.py # image-to-image
58
+ │ └── models.py # config/metadata dataclasses
59
+
60
+ ├── ui/
61
+ │ ├── layout.py # top-level UI composition
62
+ │ └── tabs/ # individual UI components
63
+
64
+ ├── presets/
65
+ │ └── styles.py # curated style presets
66
+
67
+ ├── upscaler/
68
+ │ └── realesrgan.py # NCNN Real-ESRGAN backend
69
+
70
+ ├── utils/
71
+ │ ├── history.py # persistence layer
72
+ │ ├── common.py # PIL/NumPy helpers
73
+ │ └── logger.py # structured logging
74
+
75
+ └── config/
76
+ ├── settings.py # runtime config/env
77
+ └── paths.py # project paths
78
+ ```
79
 
80
+ ---
 
 
 
 
81
 
82
+ # Technical Highlights
83
 
84
+ ### Efficient CPU Deployment
85
 
86
+ HF Spaces have **no GPU**, 16 GB RAM.
87
+ Generation speed is optimized via:
 
 
88
 
89
+ * latent consistency (Turbo)
90
+ * reduced step ranges
91
+ * VAE tiling for memory distribution
92
+ * attention slicing
93
+ * deferring safety checker if private
94
 
95
+ This reduces **CPU inference from ~220s → <70s** for 512px prompts, without unacceptable quality loss.
96
 
97
+ ### Multi-Pipeline Switching
98
+
99
+ Both SD pipelines are instantiated once.
100
+ The UI passes `model_choice` to the handler, which selects the correct pipeline **without rebuilding**.
101
+
102
+ This avoids 4-7 GB reload cost per click.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  ---
105
 
106
+ # Local Installation
107
 
108
+ ### 1. Clone
109
 
110
+ ```bash
111
  git clone https://github.com/sanskarmodi8/stable-diffusion-image-generator
112
  cd stable-diffusion-image-generator
113
  ```
114
 
115
+ ### 2. Environment
116
 
117
+ ```bash
118
+ python -m venv .venv
119
+ source .venv/bin/activate
 
120
  ```
121
 
122
+ ### 3. Install Dependencies
123
 
124
+ Install PyTorch for GPU (leave if on CPU):
125
+
126
+ ```bash
127
  pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
128
  ```
129
 
130
+ Install core libs:
131
 
132
+ ```bash
133
  pip install -r requirements.txt
134
  ```
135
 
136
+ ### 4. HuggingFace Login (optional)
137
 
138
+ ```bash
139
  huggingface-cli login
140
  ```
141
 
142
  ---
143
 
144
+ # Running
145
 
146
+ ```bash
147
+ python src/sdgen/main.py
148
  ```
149
 
150
+ UI available at:
151
 
152
  ```
153
  http://127.0.0.1:7860
 
155
 
156
  ---
157
 
158
+ # Roadmap (LoRA, QLoRA, and Training)
159
 
160
+ **Update planned**: full LoRA loading and fine-tuning support.
161
 
162
+ Scope includes:
163
 
164
+ ### 1. LoRA Runtime Inference
165
 
166
+ * Load LoRA weights into existing UNet
167
+ * Adjustable LoRA alpha/scaling
168
+ * UI selector for LoRA checkpoints
169
+ * Enable mixing multiple LoRAs
170
 
171
+ Implementation plan:
172
+
173
+ * Attach `lora_attn_procs` to model
174
+ * Discover `.safetensors` in `/assets/lora`
175
+ * Store LoRA metadata in history
176
+ * Persist alpha value and presets
177
+
178
+ ### 2. QLoRA Fine-Tuning
179
+
180
+ * Train lightweight LoRA modules on GPUs (11GB VRAM OK)
181
+ * Use parameter-efficient training
182
+ * Merge adapters for export
183
+ * Allow user fine-tuning via command line
184
+
185
+ Stack:
186
+
187
+ * accelerate
188
+ * peft
189
+ * bitsandbytes (if GPU available)
190
+
191
+ UI tab planned:
192
+
193
+ * dataset upload
194
+ * config builder
195
+ * start training
196
+ * track loss, sample outputs
197
+
198
+ **Why LoRA?**
199
+
200
+ * Enables personal styles without training the full model
201
+ * Reduces VRAM and compute cost by 50–200×
202
+ * Industry-standard for SD customization
203
+
204
+ ---
205
+
206
+ # Contributing
207
+
208
+ This repo is configured with **pre-commit**:
209
+
210
+ * black
211
+ * ruff
212
+ * isort
213
+ * docstring linting (Google style)
214
+
215
+ Install hooks:
216
+
217
+ ```bash
218
  pre-commit install
219
  ```
220
 
221
+ Test formatting:
222
 
223
+ ```bash
 
 
224
  ruff check .
225
+ black .
226
  ```
227
 
228
+ Branching convention:
 
 
229
 
230
  ```
231
+ feat/<feature>
232
+ fix/<issue>
233
  refactor/<module>
234
  ```
235
 
236
+ ---
237
 
238
+ # License
239
 
240
+ This project is licensed under [MIT License](LICENSE).
 
 
 
 
 
241
 
242
  ---
243
 
244
+ # Author
 
 
245
 
246
+ **Sanskar Modi**
247
 
248
+ Machine Learning Engineer
249
+ Focused on production-grade ML systems.
250
 
251
+ GitHub: [https://github.com/sanskarmodi8](https://github.com/sanskarmodi8)
 
pyproject.toml CHANGED
@@ -1,12 +1,8 @@
1
- [project]
2
- name = "sdgen"
3
- version = "0.0.0"
4
- requires-python = ">=3.10"
5
- dependencies = []
6
-
7
  [build-system]
8
  requires = ["setuptools", "wheel"]
9
  build-backend = "setuptools.build_meta"
10
 
11
- [project.scripts]
12
- sdgen = "sdgen.main:main"
 
 
 
 
 
 
 
 
 
1
  [build-system]
2
  requires = ["setuptools", "wheel"]
3
  build-backend = "setuptools.build_meta"
4
 
5
+ [project]
6
+ name = "sdgen"
7
+ version = "0.1.0"
8
+ dependencies = []
setup.cfg ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ [options]
2
+ packages = find:
3
+ package_dir =
4
+ =src
5
+
6
+ [options.packages.find]
7
+ where = src