harsh99 commited on
Commit
80813ab
Β·
verified Β·
1 Parent(s): 37b23ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -236
README.md CHANGED
@@ -1,237 +1,11 @@
1
- # 🎨 Stable Diffusion & CatVTON Implementation
2
-
3
- <div align="center">
4
-
5
- ![Stable Diffusion](https://img.shields.io/badge/Stable%20Diffusion-From%20Scratch-blue?style=for-the-badge\&logo=pytorch) <br>
6
- ![CatVTON](https://img.shields.io/badge/CatVTON-Virtual%20Try--On-purple?style=for-the-badge)
7
- ![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge\&logo=pytorch\&logoColor=white)
8
- ![Python](https://img.shields.io/badge/Python-3.10.9-green?style=for-the-badge\&logo=python\&logoColor=white)
9
-
10
- *A comprehensive implementation of Stable Diffusion from scratch with CatVTON virtual try-on capabilities*
11
-
12
- </div>
13
-
14
  ---
15
-
16
- ## Table of Contents
17
-
18
- * [Overview](#overview)
19
- * [Project Structure](#project-structure)
20
- * [Features](#features)
21
- * [Setup & Installation](#setup--installation)
22
- * [Model Downloads](#model-downloads)
23
- * [CatVTON Integration](#catvton-integration)
24
- * [References](#references)
25
- * [Author](#author)
26
- * [License](#license)
27
-
28
- ---
29
-
30
- ## Overview
31
-
32
- This project implements **Stable Diffusion from scratch** using PyTorch, extended with **CatVTON (Virtual Cloth Try-On)** for realistic fashion try-on.
33
-
34
- * Complete Stable Diffusion pipeline (Branch: `main`)
35
- * CatVTON virtual try-on extension (Branch: `CatVTON`)
36
- * DDPM-based denoising, VAE, and custom attention
37
- * Inpainting and text-to-image capabilities
38
-
39
- ---
40
-
41
- ## Project Structure
42
-
43
- ```text
44
- stable-diffusion/
45
- β”œβ”€β”€ Core Components
46
- β”‚ β”œβ”€β”€ attention.py # Attention mechanisms
47
- β”‚ β”œβ”€β”€ clip.py # CLIP model
48
- β”‚ β”œβ”€β”€ ddpm.py # DDPM sampler
49
- β”‚ β”œβ”€β”€ decoder.py # VAE decoder
50
- β”‚ β”œβ”€β”€ encoder.py # VAE encoder
51
- β”‚ β”œβ”€β”€ diffusion.py # Diffusion logic
52
- β”‚ β”œβ”€β”€ model.py # Weight loading
53
- β”‚ └── pipeline.py # Main pipeline logic
54
- β”‚
55
- β”œβ”€β”€ Utilities & Interface
56
- β”‚ β”œβ”€β”€ interface.py # Interactive script
57
- β”‚ β”œβ”€β”€ model_converter.py # Weight conversion utilities
58
- β”‚ └── requirements.txt # Python dependencies
59
- β”‚
60
- β”œβ”€β”€ Data & Models
61
- β”‚ β”œβ”€β”€ vocab.json
62
- β”‚ β”œβ”€β”€ merges.txt
63
- β”‚ β”œβ”€β”€ inkpunk-diffusion-v1.ckpt
64
- β”‚ └── sd-v1-5-inpainting.ckpt
65
- β”‚
66
- β”œβ”€β”€ Sample Data
67
- β”‚ β”œβ”€β”€ person.jpg
68
- β”‚ β”œβ”€β”€ garment.jpg
69
- β”‚ β”œβ”€β”€ agnostic_mask.png
70
- β”‚ β”œβ”€β”€ dog.jpg
71
- β”‚ β”œβ”€β”€ image.png
72
- β”‚ └── zalando-hd-resized.zip
73
- β”‚
74
- └── Notebooks & Docs
75
- β”œβ”€β”€ test.ipynb
76
- └── README.md
77
- ```
78
-
79
- ---
80
-
81
- ## Features
82
-
83
- ### Stable Diffusion Core
84
-
85
- * From-scratch implementation with modular architecture
86
- * Custom CLIP encoder integration
87
- * Latent space generation using VAE
88
- * DDPM sampling process
89
- * Self-attention mechanisms for denoising
90
-
91
- ### CatVTON Capabilities
92
-
93
- * Virtual try-on using inpainting
94
- * Pose-aligned garment fitting
95
- * Segmentation mask based garment overlay
96
-
97
- ---
98
-
99
- ## Setup & Installation
100
-
101
- ### Prerequisites
102
-
103
- * Python 3.10.9
104
- * CUDA-compatible GPU
105
- * Git, Conda or venv
106
-
107
- ### Clone Repository
108
-
109
- ```bash
110
- git clone https://github.com/Harsh-Kesharwani/stable-diffusion.git
111
- cd stable-diffusion
112
- git checkout CatVTON # for try-on features
113
- ```
114
-
115
- ### Create Environment
116
-
117
- ```bash
118
- conda create -n stable-diffusion python=3.10.9
119
- conda activate stable-diffusion
120
- ```
121
-
122
- ### Install Requirements
123
-
124
- ```bash
125
- pip install -r requirements.txt
126
- ```
127
-
128
- ### Test Installation
129
-
130
- ```bash
131
- python -c "import torch; print(torch.__version__)"
132
- python -c "import torch; print(torch.cuda.is_available())"
133
- ```
134
-
135
- ---
136
-
137
- ## Model Downloads
138
-
139
- ### Tokenizer Files (from SD v1.4)
140
-
141
- * `vocab.json`
142
- * `merges.txt`
143
-
144
- Download from: [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main/tokenizer)
145
-
146
- ### Model Checkpoints
147
-
148
- * `inkpunk-diffusion-v1.ckpt`: [Inkpunk Model](https://huggingface.co/Envvi/Inkpunk-Diffusion/tree/main)
149
- * `sd-v1-5-inpainting.ckpt`: [Inpainting Weights](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting)
150
-
151
- ### Download Script
152
-
153
- ```bash
154
- mkdir -p data
155
- wget -O data/vocab.json "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json"
156
- wget -O data/merges.txt "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/merges.txt"
157
- ```
158
-
159
- ---
160
-
161
- ## CatVTON Integration
162
-
163
- The CatVTON extension allows realistic cloth try-on using Stable Diffusion inpainting.
164
-
165
- ### Highlights
166
-
167
- * `sd-v1-5-inpainting.ckpt` for image completion
168
- * Garment alignment to human pose
169
- * Agnostic segmentation mask usage
170
-
171
- Run the interface:
172
-
173
- ```bash
174
- python interface.py
175
- ```
176
-
177
- ---
178
-
179
- ## References
180
-
181
- ### Articles & Guides
182
-
183
- * [Stable Diffusion from Scratch (Medium)](https://medium.com/@sayedebad.777/implementing-stable-diffusion-from-scratch-using-pytorch-f07d50efcd97)
184
- * [YouTube: Diffusion Implementation](https://www.youtube.com/watch?v=ZBKpAp_6TGI)
185
-
186
- ### HuggingFace Resources
187
-
188
- * [Stable Diffusion v1.5 Inpainting](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting)
189
- * [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)
190
- * [Inkpunk Diffusion](https://huggingface.co/Envvi/Inkpunk-Diffusion)
191
-
192
- ### Papers
193
-
194
- * Stable Diffusion: Latent Diffusion Models
195
- * DDPM: Denoising Diffusion Probabilistic Models
196
- * CatVTON: Category-aware Try-On Network
197
-
198
- ---
199
-
200
- ## Author
201
-
202
- <div align="center">
203
-
204
- **Harsh Kesharwani**
205
-
206
- [![GitHub](https://img.shields.io/badge/GitHub-100000?style=for-the-badge\&logo=github\&logoColor=white)](https://github.com/Harsh-Kesharwani)
207
- [![LinkedIn](https://img.shields.io/badge/LinkedIn-0A66C2?style=for-the-badge\&logo=linkedin\&logoColor=white)](https://www.linkedin.com/in/harsh-kesharwani/)
208
- [![Email](https://img.shields.io/badge/Email-D14836?style=for-the-badge\&logo=gmail\&logoColor=white)](mailto:harshkesharwani777@gmail.com)
209
-
210
- *Passionate about AI, Computer Vision, and Generative Models*
211
-
212
- </div>
213
-
214
- ---
215
-
216
- ## License
217
-
218
- This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
219
-
220
- ---
221
-
222
- ## Acknowledgments
223
-
224
- * CompVis team for Stable Diffusion
225
- * HuggingFace for models and APIs
226
- * Zalando Research for dataset
227
- * Open-source contributors and educators
228
-
229
- ---
230
-
231
- <div align="center">
232
-
233
- **⭐ Star this repo if you found it helpful!**
234
-
235
- *Built with ❀️ by [Harsh Kesharwani](https://www.linkedin.com/in/harsh-kesharwani/)*
236
-
237
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Virtual Try-On with CatVTON
3
+ emoji: πŸ§₯
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---