TRY_ON / README.md

UPDATE README (#1)

28bd534 verified 5 months ago

4.79 kB

	---
	license: creativeml-openrail-m
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: image-to-image
	---
	# Project Chronicle: A Journey into Virtual Try-On with Diffusion Models

	This document outlines the development journey of this project, which aims to implement the "TryOnDiffusion: A Tale of Two UNets" paper. It serves as a log of the learning process, implementation steps, challenges faced, and future goals.

	## Tech Stack

	![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge&logo=pytorch&logoColor=white)
	![Transformers](https://img.shields.io/badge/🤗%20Transformers-yellow?style=for-the-badge)
	![Weights & Biases](https://img.shields.io/badge/Weights%26_Biases-FFBE00?style=for-the-badge&logo=WeightsAndBiases&logoColor=black)
	![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow)](https://huggingface.co/Aditya757864/TRY_ON)

	---

	## Phase 1: Foundational Learning (The Groundwork)

	* Core Concepts: Started with the fundamentals of Computer Vision and mastered the PyTorch framework.
	* Generative Adversarial Networks (GANs): Implemented and trained a POKEGAN to gain practical experience with generative models.
	* Introduction to Diffusion Models: Shifted focus to diffusion models, successfully training a Denoising Diffusion Probabilistic Model (DDPM) on the Fashion MNIST dataset (28x28 images) using an NVIDIA RTX 3090.
	* Data Pipeline Mastery: Revisited and gained a deeper understanding of PyTorch's `DataLoader` and custom data handling pipelines.

	---

	## Phase 2: Advanced Concepts & Paper Selection (Scaling Up)

	* Advanced Architectures: Studied Transformers and the Attention mechanism to understand how models process long-range dependencies.
	* Modulation Techniques: Explored specific neural network techniques like Feature-wise Linear Modulation (FiLM) for conditioning generative models.
	* Research & Direction: After a thorough literature review, the "TryOnDiffusion: A Tale of Two UNets" paper was selected as the primary research goal for this project.

	---

	## Phase 3: Implementation, Training, and Debugging (Getting Hands-On)

	* Codebase Adaptation: Forked and analyzed an open-source implementation by fashnAI as a starting point.
	* Custom Development:
	* Engineered a custom data mapper and `DataLoader` to process the HR-VITON dataset.
	* Wrote a custom trainer script tailored to the model's specific needs and for better control over the training loop.
	* Technical Challenges: Successfully debugged and resolved several breaking changes caused by library updates in the original repository.
	* Model Training:
	* Initiated training on a subset of the HR-VITON dataset (500 images).
	* Utilized an NVIDIA RTX 4090 (24GB) for the computationally intensive training process.
	* Tracked metrics, losses, and logs meticulously using Weights & Biases (`wandb`).
	* Evaluation: Created a sampling script to generate image outputs from checkpoints to qualitatively assess model performance.

	---

	## Phase 4: The Plateau & The Path Forward (Current Status)

	> Current Challenge: The model's loss has stagnated and remains constant. This suggests the model is no longer learning, likely due to overfitting on the small dataset or a subtle issue in the data pipeline.

	### Visual Analysis

	Sample model output after 2000 epochs.
	\| Original Input \| Input Features \| Generated Output \|
	\| ----- \| ----- \| ----- \|
	\| <img src="./original.png" alt="Original Input Image" width="300"> \| <img src="./imputs.png" alt="Input Features Image" width="80"> \| <img src="./our_output.png" alt="Generated Output Image" > \|


	W&B loss curve, clearly illustrating the training plateau.
	![Wandb loss curve showing a flat line](./assets/wandb.png)

	* Immediate Goals:
	1. Debug the training process: Perform sanity checks like overfitting on a single batch to verify the model's learning capacity.
	2. Verify the data pipeline: Thoroughly visualize the inputs (warped clothes, agnostic masks, pose maps) being fed to the model to ensure they are correct.
	3. Investigate Loss Function: The current loss (e.g., L1 or L2) might not be optimal. Experiment with alternatives like a perceptual loss (LPIPS - Learned Perceptual Image Patch Similarity) to better capture visual similarity.
	4. Tune Hyperparameters: Experiment with the learning rate and other key hyperparameters.
	* Long-Term Vision: Resolve the training plateau, scale up the training to a larger dataset, and successfully replicate the results of the TryOnDiffusion paper.

	---
	license: creativeml-openrail-m
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: image-to-image
	---
	# Project Chronicle: A Journey into Virtual Try-On with Diffusion Models

	This document outlines the development journey of this project, which aims to implement the "TryOnDiffusion: A Tale of Two UNets" paper. It serves as a log of the learning process, implementation steps, challenges faced, and future goals.

	## Tech Stack

	![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge&logo=pytorch&logoColor=white)
	![Transformers](https://img.shields.io/badge/🤗%20Transformers-yellow?style=for-the-badge)
	![Weights & Biases](https://img.shields.io/badge/Weights%26_Biases-FFBE00?style=for-the-badge&logo=WeightsAndBiases&logoColor=black)
	![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow)](https://huggingface.co/Aditya757864/TRY_ON)

	---

	## Phase 1: Foundational Learning (The Groundwork)

	* Core Concepts: Started with the fundamentals of Computer Vision and mastered the PyTorch framework.
	* Generative Adversarial Networks (GANs): Implemented and trained a POKEGAN to gain practical experience with generative models.
	* Introduction to Diffusion Models: Shifted focus to diffusion models, successfully training a Denoising Diffusion Probabilistic Model (DDPM) on the Fashion MNIST dataset (28x28 images) using an NVIDIA RTX 3090.
	* Data Pipeline Mastery: Revisited and gained a deeper understanding of PyTorch's `DataLoader` and custom data handling pipelines.

	---

	## Phase 2: Advanced Concepts & Paper Selection (Scaling Up)

	* Advanced Architectures: Studied Transformers and the Attention mechanism to understand how models process long-range dependencies.
	* Modulation Techniques: Explored specific neural network techniques like Feature-wise Linear Modulation (FiLM) for conditioning generative models.
	* Research & Direction: After a thorough literature review, the "TryOnDiffusion: A Tale of Two UNets" paper was selected as the primary research goal for this project.

	---

	## Phase 3: Implementation, Training, and Debugging (Getting Hands-On)

	* Codebase Adaptation: Forked and analyzed an open-source implementation by fashnAI as a starting point.
	* Custom Development:
	* Engineered a custom data mapper and `DataLoader` to process the HR-VITON dataset.
	* Wrote a custom trainer script tailored to the model's specific needs and for better control over the training loop.
	* Technical Challenges: Successfully debugged and resolved several breaking changes caused by library updates in the original repository.
	* Model Training:
	* Initiated training on a subset of the HR-VITON dataset (500 images).
	* Utilized an NVIDIA RTX 4090 (24GB) for the computationally intensive training process.
	* Tracked metrics, losses, and logs meticulously using Weights & Biases (`wandb`).
	* Evaluation: Created a sampling script to generate image outputs from checkpoints to qualitatively assess model performance.

	---

	## Phase 4: The Plateau & The Path Forward (Current Status)

	> Current Challenge: The model's loss has stagnated and remains constant. This suggests the model is no longer learning, likely due to overfitting on the small dataset or a subtle issue in the data pipeline.

	### Visual Analysis

	Sample model output after 2000 epochs.
	\| Original Input \| Input Features \| Generated Output \|
	\| ----- \| ----- \| ----- \|
	\| <img src="./original.png" alt="Original Input Image" width="300"> \| <img src="./imputs.png" alt="Input Features Image" width="80"> \| <img src="./our_output.png" alt="Generated Output Image" > \|


	W&B loss curve, clearly illustrating the training plateau.
	![Wandb loss curve showing a flat line](./assets/wandb.png)

	* Immediate Goals:
	1. Debug the training process: Perform sanity checks like overfitting on a single batch to verify the model's learning capacity.
	2. Verify the data pipeline: Thoroughly visualize the inputs (warped clothes, agnostic masks, pose maps) being fed to the model to ensure they are correct.
	3. Investigate Loss Function: The current loss (e.g., L1 or L2) might not be optimal. Experiment with alternatives like a perceptual loss (LPIPS - Learned Perceptual Image Patch Similarity) to better capture visual similarity.
	4. Tune Hyperparameters: Experiment with the learning rate and other key hyperparameters.
	* Long-Term Vision: Resolve the training plateau, scale up the training to a larger dataset, and successfully replicate the results of the TryOnDiffusion paper.