TRY_ON / README.md
Aditya757864's picture
UPDATE README (#1)
28bd534 verified
metadata
license: creativeml-openrail-m
language:
  - en
metrics:
  - accuracy
pipeline_tag: image-to-image

Project Chronicle: A Journey into Virtual Try-On with Diffusion Models

This document outlines the development journey of this project, which aims to implement the "TryOnDiffusion: A Tale of Two UNets" paper. It serves as a log of the learning process, implementation steps, challenges faced, and future goals.

Tech Stack

PyTorch Transformers Weights & Biases Python Hugging Face


Phase 1: Foundational Learning (The Groundwork)

  • Core Concepts: Started with the fundamentals of Computer Vision and mastered the PyTorch framework.
  • Generative Adversarial Networks (GANs): Implemented and trained a POKEGAN to gain practical experience with generative models.
  • Introduction to Diffusion Models: Shifted focus to diffusion models, successfully training a Denoising Diffusion Probabilistic Model (DDPM) on the Fashion MNIST dataset (28x28 images) using an NVIDIA RTX 3090.
  • Data Pipeline Mastery: Revisited and gained a deeper understanding of PyTorch's DataLoader and custom data handling pipelines.

Phase 2: Advanced Concepts & Paper Selection (Scaling Up)

  • Advanced Architectures: Studied Transformers and the Attention mechanism to understand how models process long-range dependencies.
  • Modulation Techniques: Explored specific neural network techniques like Feature-wise Linear Modulation (FiLM) for conditioning generative models.
  • Research & Direction: After a thorough literature review, the "TryOnDiffusion: A Tale of Two UNets" paper was selected as the primary research goal for this project.

Phase 3: Implementation, Training, and Debugging (Getting Hands-On)

  • Codebase Adaptation: Forked and analyzed an open-source implementation by fashnAI as a starting point.
  • Custom Development:
    • Engineered a custom data mapper and DataLoader to process the HR-VITON dataset.
    • Wrote a custom trainer script tailored to the model's specific needs and for better control over the training loop.
  • Technical Challenges: Successfully debugged and resolved several breaking changes caused by library updates in the original repository.
  • Model Training:
    • Initiated training on a subset of the HR-VITON dataset (500 images).
    • Utilized an NVIDIA RTX 4090 (24GB) for the computationally intensive training process.
    • Tracked metrics, losses, and logs meticulously using Weights & Biases (wandb).
  • Evaluation: Created a sampling script to generate image outputs from checkpoints to qualitatively assess model performance.

Phase 4: The Plateau & The Path Forward (Current Status)

Current Challenge: The model's loss has stagnated and remains constant. This suggests the model is no longer learning, likely due to overfitting on the small dataset or a subtle issue in the data pipeline.

Visual Analysis

Sample model output after 2000 epochs.

Original Input Input Features Generated Output
Original Input Image Input Features Image Generated Output Image

W&B loss curve, clearly illustrating the training plateau. Wandb loss curve showing a flat line

  • Immediate Goals:
    1. Debug the training process: Perform sanity checks like overfitting on a single batch to verify the model's learning capacity.
    2. Verify the data pipeline: Thoroughly visualize the inputs (warped clothes, agnostic masks, pose maps) being fed to the model to ensure they are correct.
    3. Investigate Loss Function: The current loss (e.g., L1 or L2) might not be optimal. Experiment with alternatives like a perceptual loss (LPIPS - Learned Perceptual Image Patch Similarity) to better capture visual similarity.
    4. Tune Hyperparameters: Experiment with the learning rate and other key hyperparameters.
  • Long-Term Vision: Resolve the training plateau, scale up the training to a larger dataset, and successfully replicate the results of the TryOnDiffusion paper.