codeformer / DOCUMENTATION.md
sd
Upload 110 files
adf2fff verified
# CodeFormer Face Restoration - Project Documentation
## 1. Introduction
**CodeFormer** is a robust blind face restoration algorithm designed to restore old, degraded, or AI-generated face images. It utilizes a **Codebook Lookup Transformer** (VQGAN-based) to predict high-quality facial features even from severe degradation, ensuring that the restored faces look natural and faithful to the original identity.
This project wraps the core CodeFormer research code into a deployable, user-friendly **Flask Web Application**, containerized with **Docker** for easy deployment on platforms like Hugging Face Spaces.
### Key Features
* **Blind Face Restoration:** Restores faces from low-quality inputs without knowing the specific degradation details.
* **Background Enhancement:** Uses **Real-ESRGAN** to upscale and enhance the non-face background regions of the image.
* **Face Alignment & Paste-back:** Automatically detects faces, aligns them for processing, and seamlessly blends them back into the original image.
* **Adjustable Fidelity:** Users can balance between restoration quality (hallucinating details) and identity fidelity (keeping the original look).
---
## 2. System Architecture
The application is built on a Python/PyTorch backend served via Flask.
### 2.1 Technology Stack
* **Framework:** Flask (Python Web Server)
* **Deep Learning:** PyTorch, TorchVision
* **Image Processing:** OpenCV, NumPy, Pillow
* **Core Libraries:** `basicsr` (Basic Super-Restoration), `facelib` (Face detection/utils)
* **Frontend:** HTML5, Bootstrap 5, Jinja2 Templates
* **Containerization:** Docker (CUDA-enabled)
### 2.2 Directory Structure
```
CodeFormer/
β”œβ”€β”€ app.py # Main Flask application entry point
β”œβ”€β”€ Dockerfile # Container configuration
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ basicsr/ # Core AI framework (Super-Resolution tools)
β”œβ”€β”€ facelib/ # Face detection and alignment utilities
β”œβ”€β”€ templates/ # HTML Frontend
β”‚ β”œβ”€β”€ index.html # Upload interface
β”‚ └── result.html # Results display
β”œβ”€β”€ static/ # Static assets (css, js, uploads)
β”‚ β”œβ”€β”€ uploads/ # Temporary storage for input images
β”‚ └── results/ # Temporary storage for processed output
└── weights/ # Pre-trained model weights (downloaded on startup)
β”œβ”€β”€ CodeFormer/ # CodeFormer model (.pth)
β”œβ”€β”€ facelib/ # Detection (RetinaFace) and Parsing models
└── realesrgan/ # Background upscaler (Real-ESRGAN)
```
### 2.3 Logic Flow
1. **Input:** User uploads an image via the Web UI.
2. **Pre-processing (`app.py`):**
* Image is saved to `static/uploads`.
* Parameters (fidelity, upscale factor) are parsed.
3. **Inference Pipeline:**
* **Detection:** `facelib` detects faces in the image using RetinaFace.
* **Alignment:** Faces are cropped and aligned to a standard 512x512 resolution.
* **Restoration:** The **CodeFormer** model processes the aligned faces.
* **Upscaling (Optional):** The background is upscaled using **Real-ESRGAN**.
* **Paste-back:** Restored faces are warped back to their original positions and blended.
4. **Output:** The final image is saved to `static/results` and displayed to the user.
---
## 3. Installation & Deployment
### 3.1 Docker Deployment (Recommended)
The project is optimized for Docker.
**Prerequisites:** Docker, NVIDIA GPU (optional, but recommended).
1. **Build the Image:**
```bash
docker build -t codeformer-app .
```
2. **Run the Container:**
```bash
# Run on port 7860 (Standard for HF Spaces)
docker run -it -p 7860:7860 codeformer-app
```
*Note: To use GPU, add the `--gpus all` flag to the run command.*
### 3.2 Hugging Face Spaces Deployment
This repository is configured for direct deployment to Hugging Face.
1. Create a **Docker** Space on Hugging Face.
2. Push this entire repository to the Space's Git remote.
```bash
git remote add hf git@hf.co:spaces/USERNAME/SPACE_NAME
git push hf main
```
3. The Space will build (approx. 5-10 mins) and launch automatically.
### 3.3 Local Development
1. **Install Environment:**
```bash
conda create -n codeformer python=3.8
conda activate codeformer
pip install -r requirements.txt
```
2. **Install Basicsr:**
```bash
python basicsr/setup.py install
```
3. **Run App:**
```bash
python app.py
```
---
## 4. User Guide (Web Interface)
### 4.1 Interface Controls
* **Input Image:** Supports standard formats (JPG, PNG, WEBP). Drag and drop supported.
* **Fidelity Weight (w):**
* **Range:** 0.0 to 1.0.
* **0.0 (Better Quality):** The model "hallucinates" more details. Results look very sharp and high-quality but may slightly alter the person's identity (look less like the original).
* **1.0 (Better Identity):** The model sticks strictly to the original features. Results are faithful to the original photo but might be blurrier or contain more artifacts.
* **Recommended:** 0.5 is a balanced default.
* **Upscale Factor:**
* Scales the final output resolution (1x, 2x, or 4x).
* *Note: Higher scaling requires more VRAM.*
* **Enhance Background:**
* If checked, runs Real-ESRGAN on the non-face areas.
* *Recommendation:* Keep checked for full-photo restoration. Uncheck if you only care about the face or are running on limited hardware.
* **Upsample Face:**
* If checked, the restored face is also upsampled to match the background resolution.
### 4.2 Viewing Results
The result page features an interactive **Before/After Slider**. Drag the handle left and right to compare the pixels of the original versus the restored image directly.
---
## 5. Technical Details
### 5.1 Model Weights
The application automatically checks for and downloads the following weights to the `weights/` directory on startup:
| Model | Path | Description |
| :--- | :--- | :--- |
| **CodeFormer** | `weights/CodeFormer/codeformer.pth` | Main restoration model. |
| **RetinaFace** | `weights/facelib/detection_Resnet50_Final.pth` | Face detection. |
| **ParseNet** | `weights/facelib/parsing_parsenet.pth` | Face parsing (segmentation). |
| **Real-ESRGAN** | `weights/realesrgan/RealESRGAN_x2plus.pth` | Background upscaler (x2). |
### 5.2 Performance Notes
* **Memory:** The full pipeline (CodeFormer + Real-ESRGAN) requires significant RAM/VRAM. On CPU-only environments (like basic HF Spaces), processing a single image may take 30-60 seconds.
* **Git LFS:** Image assets in this repository are tracked with Git LFS to keep the repo size manageable.
---
## 6. Credits & References
* **Original Paper:** [Towards Robust Blind Face Restoration with Codebook Lookup Transformer (NeurIPS 2022)](https://arxiv.org/abs/2206.11253)
* **Authors:** Shangchen Zhou, Kelvin C.K. Chan, Chongyi Li, Chen Change Loy (S-Lab, Nanyang Technological University).
* **Original Repository:** [sczhou/CodeFormer](https://github.com/sczhou/CodeFormer)