sketch2age / PROJECT_DOCUMENTATION.md
mithun095's picture
Upload folder using huggingface_hub
7941bd2 verified
# Project Documentation: Sketch-to-Aged Face Pipeline
## 1. Project Overview
The **Sketch-to-Aged Face Pipeline** is an advanced AI application that bridges the gap between artistic sketches and realistic face aging. It creates a unified workflow where users can upload a simple line drawing (sketch) and generate a photorealistic aged face at a specific target age (20-70 years old).
This project integrates two powerful Generative Adversarial Network (GAN) models:
1. **Sketch-to-Face (Pix2Pix)**: Converts a black-and-white sketch into a realistic face.
2. **Face Aging (HRFAE)**: Applies age progression to the generated face while preserving identity.
---
## 2. Key Terminologies & Concepts
To understand how this project works, it's helpful to know these core AI concepts:
### **GAN (Generative Adversarial Network)**
A class of machine learning frameworks designed by Ian Goodfellow. It consists of two neural networks contesting with each other in a game:
* **Generator**: Tries to create "fake" data (like an image) that looks real.
* **Discriminator**: Tries to distinguish between the generator's fake data and real data.
* *Analogy*: The Generator is a counterfeiter trying to make fake money, and the Discriminator is the police trying to spot it. Over time, both get better at their jobs.
### **Pix2Pix**
A specific type of GAN designed for **Image-to-Image Translation**. It learns a mapping from an input image to an output image.
* *In our project*: It maps the "Sketch" domain → "Realistic Face" domain.
### **Latent Space**
A compressed, abstract representation of data. In face generation, the "latent space" represents the features of a face (eyes, nose, shape) as a set of numbers (vectors).
* *Manipulation*: By slightly changing these numbers, we can change features of the face (e.g., make it older or younger) without changing the person's identity.
### **HRFAE (High-Resolution Face Age Editing)**
The specific model used for aging. It assumes that a face image consists of:
* **Identity**: Who the person is.
* **Age**: How old they look.
* **Background/Style**: Other details.
HRFAE separates these components so we can change the *Age* while keeping the *Identity* locked.
### **Inference**
The process of using a trained model to make predictions (or generate images) on new data. When you click "Generate," you are running model inference.
---
## 3. Project Workflow
The pipeline follows a linear data flow:
```mermaid
graph LR
A[Input Sketch] --> B(Preprocessing);
B --> C{Sketch-to-Face Model};
C --> D[Generated Face];
D --> E(Alignment & Encoding);
E --> F{HRFAE Aging Model};
F --> G[Aged Face Output];
```
### Step 1: Input & Preprocessing
- **Input**: User uploads a sketch image.
- **Preprocessing**: The image is resized to 256x256 pixels, converted to tensor format, and normalized (pixel values scaled between -1 and 1) to match what the model expects.
### Step 2: Sketch-to-Face Conversion
- **Model**: `pix2pix` (Generator network).
- **Process**: The model takes the sketch tensor and generates a corresponding "real" face in RGB color.
- **Output**: A 256x256 realistic face image.
### Step 3: Face Aging
- **Model**: `HRFAE` (High-Resolution Face Age Editing).
- **Input**: The face from Step 2 + a Target Age (e.g., 65).
- **Process**:
1. The face is encoded into latent space.
2. The model calculates a "trajectory" to move the face from its estimated current age to the target age.
3. It generates a new image based on this new age position.
- **Output**: The same face, but with aging effects (wrinkles, skin texture changes, grey hair) appropriate for the target age.
---
## 4. Component Details
### `pipeline.py` (The Brain)
This file is the central nervous system of the project. It handles:
* **Loading Models**: It loads the heavy AI models into memory only once (singleton pattern) to save resources.
* **Data Flow**: passes data from the sketch model to the aging model.
* **Helpers**: Contains helper functions like `sketch_to_face()`, `age_face()`, and `sketch_to_aged_face()`.
### `app.py` (The Interface)
Built with **Gradio**, this provides the web UI.
* **Tabs**: Separates functionality into logical sections (Full Pipeline, Testing individual models).
* **Interaction**: Handles button clicks, sliders, and image uploads.
### `run_pipeline.py` (The Automation)
A Command Line Interface (CLI) tool.
* **Batch Processing**: Can process an entire folder of sketches at once.
* **Sequences**: Can generate a "timelapse" of a single sketch from age 20 to 70.
---
## 5. Model Architectures Deep Dive
### Sketch-to-Face (Pix2Pix)
* **Architecture**: U-Net Generator + PatchGAN Discriminator.
* **U-Net**: A network shape that looks like a 'U'. It downsamples the image to capture context (the "what") and then upsamples it to generate precise details (the "where"), using "skip connections" to preserve fine details from the input sketch.
### Face Aging (HRFAE)
* **Base**: Built upon **StyleGAN**, a state-of-the-art face generator.
* **Mechanism**: It uses an encoder to find the "latent code" of the input image. It then modulates the layers of the StyleGAN generator corresponding to "age" features, while freezing the layers corresponding to "identity".
---
## 6. Directory Structure Explained
* **`/scetch2face/`**: Contains the Pix2Pix model code and checkpoints.
* `imagetranslatormodule/`: The core library for the image translation.
* **`/FaceAgingGAN/`**: Contains the HRFAE model code.
* `HRFAE/`: The specific implementation for high-res aging.
* **`pipeline.py`**: The bridge connecting the two folders above.
* **`app.py`**: The web application entry point.
## 7. Limitations
1. **Resolution**: The pipeline currently operates optimally at 256x256 resolution.
2. **Age Range**: The aging model is trained for ages 20-70. Inputs outside this may look unrealistic.
3. **Sketch Quality**: The better the input sketch (clear lines, closed shapes), the better the output face. "Messy" sketches may yield artifacts.