Spaces:

SanskarModi
/

sd-image-gen-toolkit

Sleeping

App Files Files Community

SanskarModi commited on Dec 5, 2025

Commit

d321919

1 Parent(s): 9bc957e

updated readme

Browse files

Files changed (4) hide show

.gradio/certificate.pem +0 -31
README.md +165 -146
pyproject.toml +4 -8
setup.cfg +7 -0

.gradio/certificate.pem DELETED Viewed

@@ -1,31 +0,0 @@
------BEGIN CERTIFICATE-----
-MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
-TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
-cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
-WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
-ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
-MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
-h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
-0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
-A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
-T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
-B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
-B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
-KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
-OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
-jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
-qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
-rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
-HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
-hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
-ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
-3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
-NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
-ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
-TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
-jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
-oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
-4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
-mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
-emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
------END CERTIFICATE-----

README.md CHANGED Viewed

@@ -1,174 +1,153 @@
----
-title: stable-diffusion-image-generator
-app_file: src/sdgen/main.py
-sdk: gradio
-sdk_version: 3.50.2
----
-# 🎨 Stable Diffusion Image Generator
-AI system built using **Stable Diffusion (HuggingFace Diffusers)** and a modern **Gradio UI**.
-This project generates high-quality images from text prompts and includes advanced capabilities such as:
-* Style presets
-* Image-to-Image generation
-* Super-resolution upscaling (RealESRGAN)
-* Prompt history & metadata tracking
-* Seed reproducibility
-* LoRA extension support
 ---
-# Feature Details
-## 1️⃣ **Text-to-Image Generation**
-* Supports prompts & negative prompts
-* Adjustable steps, CFG scale, resolution
-* Seed for reproducibility
-* Preset selection panel
-## 2️⃣ **Image-to-Image (Img2Img)**
-Transform uploaded images using prompts, e.g.:
-* “Make this photo look cyberpunk”
-* “Convert this portrait into anime style”
-* “Turn into oil painting style”
-## 3️⃣ **Super-Resolution Upscaling**
-Improve output quality significantly:
-* 1.5×
-* 2×
-* 4×
-  Powered by **RealESRGAN**.
-## 4️⃣ **Style Presets**
-One-click artistic styles:
-* Anime
-* Realistic photography
-* Pixar / 3D
-* Oil painting
-* Cyberpunk neon
-## 5️⃣ **Prompt History & Metadata Tracking**
-Every generation stores:
-* Prompt
-* Negative prompt
-* Configuration
-* Seed
-* Generated image
-## 6️⃣ **LoRA Support**
-Load and use custom LoRA fine-tuned models:
-* Styles
-* Artists
-* Characters
-* Themes
----
-# 🧩 Project Architecture
-```
-stable-diffusion-image-generator/
-│
-├── app/
-│   ├── core/
-│   │   └── __init__.py
-│   │
-│   ├── pipeline.py
-│   │   # Loads & initializes Stable Diffusion (FP16, GPU, model configs)
-│   │
-│   ├── generator.py
-│   │   # Text-to-image inference logic
-│   │
-│   ├── img2img.py
-│   │   # Image-to-image transformation logic
-│   │
-│   ├── ui.py
-│   │   # Complete Gradio interface with multiple tabs:
-│   │   # Text2Img, Img2Img, Upscaling, History, About
-│   │
-│   ├── presets/
-│   │   ├── styles.py
-│   │       # Predefined artistic style presets (anime, cyberpunk, etc.)
-│   │
-│   ├── upscaler/
-│   │   ├── realesrgan.py
-│   │       # Super-resolution (1.5x, 2x, 4x)
-│   │
-│   ├── utils/
-│   │   ├── history.py     # Prompt history & metadata saving
-│   │   ├── seed.py        # Seed utilities for reproducibility
-│   │   ├── logger.py      # Central logging
-│   │
-│   ├── models/
-│   │   ├── metadata.py    # Data model for storing history entries
-│
-├── assets/
-│   ├── samples/           # Example generated images
-│   ├── lora/              # Custom LoRA models (optional)
-│
-├── main.py                # Entry point (launches Gradio app)
-├── requirements.txt       # All dependencies (pinned)
-├── LICENSE
-└── README.md
-```
 ---
-# ⚙️ Installation & Setup
-### Step 1 — Clone the Repo
-```
 git clone https://github.com/sanskarmodi8/stable-diffusion-image-generator
 cd stable-diffusion-image-generator
 ```
-### Step 2 — Create virtual environment
-```
-python -m venv venv
-source venv/bin/activate        # Linux/Mac
-venv\Scripts\activate           # Windows
 ```
-### Step 3 — Install PyTorch (GPU)
-```
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 ```
-### Step 4 — Install remaining dependencies
-```
 pip install -r requirements.txt
 ```
-### Optional — Login to HuggingFace
-```
 huggingface-cli login
 ```
 ---
-# ▶️ Running the App
-```
-python main.py
 ```
-App will run at:
 ```
 http://127.0.0.1:7860
@@ -176,57 +155,97 @@ http://127.0.0.1:7860
 ---
-# 🤝 Contributing
-This project follows **strict formatting and linting standards** to ensure clean, readable, and professional-quality code.
-#### 1. Install pre-commit hooks
-This ensures formatting and linting run **automatically** before every commit.
-```
 pre-commit install
 ```
-#### 2. Format code manually (optional)
-```
-black .
-isort .
 ruff check .
 ```
-#### 3. Create feature branches
-Follow standard naming:
 ```
-feature/<feature-name>
-fix/<bug-name>
 refactor/<module>
 ```
-#### 4. Commit messages
-Use clear, conventional messages:
-```
-feat: add anime preset
-fix: resolve img2img prompt issue
-refactor: improve pipeline loading speed
-docs: update readme
-```
 ---
-# 📄 License
-Released under the [**MIT License**](LICENSE).
----
-# ⭐ Author
-**[Sanskar Modi](https://github.com/sanskarmodi8)**
-AI Developer & Machine Learning Engineer

+# Stable Diffusion Image Generator
+A modular image generation system built on **HuggingFace Diffusers**, with support for multiple Stable Diffusion pipelines, configurable inference parameters, a clean **Gradio UI**, and a lightweight local **history/metadata store**.
+The system supports **text-to-image**, **image-to-image**, and **super-resolution upscaling** using **Real-ESRGAN (NCNN)**.
+Designed with a focus on **extensibility**, **clean code**, and **practical deployment constraints** (CPU or low-memory environments).
 ---
+# Core Features
+## Text-to-Image Generation
+* Stable Diffusion pipelines (SD 1.5, Turbo)
+* Adjustable **CFG scale**, **inference steps**, resolution, and seed
+* Structured metadata (JSON) for reproducibility
+* Style presets with recommended parameters
+## Image-to-Image (Img2Img)
+* Pipeline reuse to avoid model reload cost
+* Alpha-preserving prompt transforms
+* Configurable denoising strength
+* Deterministic or stochastic sampling
+## Upscaling (Real-ESRGAN NCNN)
+* Lightweight **NCNN backend** (GPU not required)
+* Supports 2× and 4× scaling
+* Optional SD-upscaler backend planned
+* Minimal dependencies, fast on CPU
+## Prompt History & Metadata Tracking
+* Local metadata index with atomic writes
+* Thumbnail + full-size image storage
+* JSON schema for portability
+* History browser UI
+## Multi-Model Runtime Switching
+* Multiple pipelines loaded once
+* Selection at inference without reload
+* Shared tokenizer/encoder where possible
+* Warm-up logic for fast Turbo inference
+---
+# Architecture Overview
+```
+src/sdgen/
+│
+├── sd/
+│   ├── pipeline.py          # pipeline loader, warmup, dtype/device logic
+│   ├── generator.py         # text-to-image
+│   ├── img2img.py           # image-to-image
+│   └── models.py            # config/metadata dataclasses
+│
+├── ui/
+│   ├── layout.py            # top-level UI composition
+│   └── tabs/                # individual UI components
+│
+├── presets/
+│   └── styles.py            # curated style presets
+│
+├── upscaler/
+│   └── realesrgan.py        # NCNN Real-ESRGAN backend
+│
+├── utils/
+│   ├── history.py           # persistence layer
+│   ├── common.py            # PIL/NumPy helpers
+│   └── logger.py            # structured logging
+│
+└── config/
+    ├── settings.py          # runtime config/env
+    └── paths.py             # project paths
+```
+---
+# Technical Highlights
+### Efficient CPU Deployment
+HF Spaces have **no GPU**, 16 GB RAM.
+Generation speed is optimized via:
+* latent consistency (Turbo)
+* reduced step ranges
+* VAE tiling for memory distribution
+* attention slicing
+* deferring safety checker if private
+This reduces **CPU inference from ~220s → <70s** for 512px prompts, without unacceptable quality loss.
+### Multi-Pipeline Switching
+Both SD pipelines are instantiated once.
+The UI passes `model_choice` to the handler, which selects the correct pipeline **without rebuilding**.
+This avoids 4-7 GB reload cost per click.
 ---
+# Local Installation
+### 1. Clone
+```bash
 git clone https://github.com/sanskarmodi8/stable-diffusion-image-generator
 cd stable-diffusion-image-generator
 ```
+### 2. Environment
+```bash
+python -m venv .venv
+source .venv/bin/activate
 ```
+### 3. Install Dependencies
+Install PyTorch for GPU (leave if on CPU):
+```bash
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 ```
+Install core libs:
+```bash
 pip install -r requirements.txt
 ```
+### 4. HuggingFace Login (optional)
+```bash
 huggingface-cli login
 ```
 ---
+# Running
+```bash
+python src/sdgen/main.py
 ```
+UI available at:
 ```
 http://127.0.0.1:7860
 ---
+# Roadmap (LoRA, QLoRA, and Training)
+**Update planned**: full LoRA loading and fine-tuning support.
+Scope includes:
+### 1. LoRA Runtime Inference
+* Load LoRA weights into existing UNet
+* Adjustable LoRA alpha/scaling
+* UI selector for LoRA checkpoints
+* Enable mixing multiple LoRAs
+Implementation plan:
+* Attach `lora_attn_procs` to model
+* Discover `.safetensors` in `/assets/lora`
+* Store LoRA metadata in history
+* Persist alpha value and presets
+### 2. QLoRA Fine-Tuning
+* Train lightweight LoRA modules on GPUs (11GB VRAM OK)
+* Use parameter-efficient training
+* Merge adapters for export
+* Allow user fine-tuning via command line
+Stack:
+* accelerate
+* peft
+* bitsandbytes (if GPU available)
+UI tab planned:
+* dataset upload
+* config builder
+* start training
+* track loss, sample outputs
+**Why LoRA?**
+* Enables personal styles without training the full model
+* Reduces VRAM and compute cost by 50–200×
+* Industry-standard for SD customization
+---
+# Contributing
+This repo is configured with **pre-commit**:
+* black
+* ruff
+* isort
+* docstring linting (Google style)
+Install hooks:
+```bash
 pre-commit install
 ```
+Test formatting:
+```bash
 ruff check .
+black .
 ```
+Branching convention:
 ```
+feat/<feature>
+fix/<issue>
 refactor/<module>
 ```
+---
+# License
+This project is licensed under [MIT License](LICENSE).
 ---
+# Author
+**Sanskar Modi**
+Machine Learning Engineer
+Focused on production-grade ML systems.
+GitHub: [https://github.com/sanskarmodi8](https://github.com/sanskarmodi8)

pyproject.toml CHANGED Viewed

@@ -1,12 +1,8 @@
-[project]
-name = "sdgen"
-version = "0.0.0"
-requires-python = ">=3.10"
-dependencies = []
 [build-system]
 requires = ["setuptools", "wheel"]
 build-backend = "setuptools.build_meta"
-[project.scripts]
-sdgen = "sdgen.main:main"

 [build-system]
 requires = ["setuptools", "wheel"]
 build-backend = "setuptools.build_meta"
+[project]
+name = "sdgen"
+version = "0.1.0"
+dependencies = []

setup.cfg ADDED Viewed

	@@ -0,0 +1,7 @@

+[options]
+packages = find:
+package_dir =
+    =src
+[options.packages.find]
+where = src