| # CodeFormer Face Restoration - Project Documentation |
|
|
| ## 1. Introduction |
|
|
| **CodeFormer** is a robust blind face restoration algorithm designed to restore old, degraded, or AI-generated face images. It utilizes a **Codebook Lookup Transformer** (VQGAN-based) to predict high-quality facial features even from severe degradation, ensuring that the restored faces look natural and faithful to the original identity. |
|
|
| This project wraps the core CodeFormer research code into a deployable, user-friendly **Flask Web Application**, containerized with **Docker** for easy deployment on platforms like Hugging Face Spaces. |
|
|
| ### Key Features |
| * **Blind Face Restoration:** Restores faces from low-quality inputs without knowing the specific degradation details. |
| * **Background Enhancement:** Uses **Real-ESRGAN** to upscale and enhance the non-face background regions of the image. |
| * **Face Alignment & Paste-back:** Automatically detects faces, aligns them for processing, and seamlessly blends them back into the original image. |
| * **Adjustable Fidelity:** Users can balance between restoration quality (hallucinating details) and identity fidelity (keeping the original look). |
|
|
| --- |
|
|
| ## 2. System Architecture |
|
|
| The application is built on a Python/PyTorch backend served via Flask. |
|
|
| ### 2.1 Technology Stack |
| * **Framework:** Flask (Python Web Server) |
| * **Deep Learning:** PyTorch, TorchVision |
| * **Image Processing:** OpenCV, NumPy, Pillow |
| * **Core Libraries:** `basicsr` (Basic Super-Restoration), `facelib` (Face detection/utils) |
| * **Frontend:** HTML5, Bootstrap 5, Jinja2 Templates |
| * **Containerization:** Docker (CUDA-enabled) |
|
|
| ### 2.2 Directory Structure |
| ``` |
| CodeFormer/ |
| βββ app.py # Main Flask application entry point |
| βββ Dockerfile # Container configuration |
| βββ requirements.txt # Python dependencies |
| βββ basicsr/ # Core AI framework (Super-Resolution tools) |
| βββ facelib/ # Face detection and alignment utilities |
| βββ templates/ # HTML Frontend |
| β βββ index.html # Upload interface |
| β βββ result.html # Results display |
| βββ static/ # Static assets (css, js, uploads) |
| β βββ uploads/ # Temporary storage for input images |
| β βββ results/ # Temporary storage for processed output |
| βββ weights/ # Pre-trained model weights (downloaded on startup) |
| βββ CodeFormer/ # CodeFormer model (.pth) |
| βββ facelib/ # Detection (RetinaFace) and Parsing models |
| βββ realesrgan/ # Background upscaler (Real-ESRGAN) |
| ``` |
|
|
| ### 2.3 Logic Flow |
| 1. **Input:** User uploads an image via the Web UI. |
| 2. **Pre-processing (`app.py`):** |
| * Image is saved to `static/uploads`. |
| * Parameters (fidelity, upscale factor) are parsed. |
| 3. **Inference Pipeline:** |
| * **Detection:** `facelib` detects faces in the image using RetinaFace. |
| * **Alignment:** Faces are cropped and aligned to a standard 512x512 resolution. |
| * **Restoration:** The **CodeFormer** model processes the aligned faces. |
| * **Upscaling (Optional):** The background is upscaled using **Real-ESRGAN**. |
| * **Paste-back:** Restored faces are warped back to their original positions and blended. |
| 4. **Output:** The final image is saved to `static/results` and displayed to the user. |
|
|
| --- |
|
|
| ## 3. Installation & Deployment |
|
|
| ### 3.1 Docker Deployment (Recommended) |
| The project is optimized for Docker. |
|
|
| **Prerequisites:** Docker, NVIDIA GPU (optional, but recommended). |
|
|
| 1. **Build the Image:** |
| ```bash |
| docker build -t codeformer-app . |
| ``` |
| |
| 2. **Run the Container:** |
| ```bash |
| # Run on port 7860 (Standard for HF Spaces) |
| docker run -it -p 7860:7860 codeformer-app |
| ``` |
| *Note: To use GPU, add the `--gpus all` flag to the run command.* |
| |
| ### 3.2 Hugging Face Spaces Deployment |
| This repository is configured for direct deployment to Hugging Face. |
|
|
| 1. Create a **Docker** Space on Hugging Face. |
| 2. Push this entire repository to the Space's Git remote. |
| ```bash |
| git remote add hf git@hf.co:spaces/USERNAME/SPACE_NAME |
| git push hf main |
| ``` |
| 3. The Space will build (approx. 5-10 mins) and launch automatically. |
| |
| ### 3.3 Local Development |
| 1. **Install Environment:** |
| ```bash |
| conda create -n codeformer python=3.8 |
| conda activate codeformer |
| pip install -r requirements.txt |
| ``` |
| 2. **Install Basicsr:** |
| ```bash |
| python basicsr/setup.py install |
| ``` |
| 3. **Run App:** |
| ```bash |
| python app.py |
| ``` |
| |
| --- |
|
|
| ## 4. User Guide (Web Interface) |
|
|
| ### 4.1 Interface Controls |
|
|
| * **Input Image:** Supports standard formats (JPG, PNG, WEBP). Drag and drop supported. |
| * **Fidelity Weight (w):** |
| * **Range:** 0.0 to 1.0. |
| * **0.0 (Better Quality):** The model "hallucinates" more details. Results look very sharp and high-quality but may slightly alter the person's identity (look less like the original). |
| * **1.0 (Better Identity):** The model sticks strictly to the original features. Results are faithful to the original photo but might be blurrier or contain more artifacts. |
| * **Recommended:** 0.5 is a balanced default. |
| * **Upscale Factor:** |
| * Scales the final output resolution (1x, 2x, or 4x). |
| * *Note: Higher scaling requires more VRAM.* |
| * **Enhance Background:** |
| * If checked, runs Real-ESRGAN on the non-face areas. |
| * *Recommendation:* Keep checked for full-photo restoration. Uncheck if you only care about the face or are running on limited hardware. |
| * **Upsample Face:** |
| * If checked, the restored face is also upsampled to match the background resolution. |
|
|
| ### 4.2 Viewing Results |
| The result page features an interactive **Before/After Slider**. Drag the handle left and right to compare the pixels of the original versus the restored image directly. |
|
|
| --- |
|
|
| ## 5. Technical Details |
|
|
| ### 5.1 Model Weights |
| The application automatically checks for and downloads the following weights to the `weights/` directory on startup: |
|
|
| | Model | Path | Description | |
| | :--- | :--- | :--- | |
| | **CodeFormer** | `weights/CodeFormer/codeformer.pth` | Main restoration model. | |
| | **RetinaFace** | `weights/facelib/detection_Resnet50_Final.pth` | Face detection. | |
| | **ParseNet** | `weights/facelib/parsing_parsenet.pth` | Face parsing (segmentation). | |
| | **Real-ESRGAN** | `weights/realesrgan/RealESRGAN_x2plus.pth` | Background upscaler (x2). | |
|
|
| ### 5.2 Performance Notes |
| * **Memory:** The full pipeline (CodeFormer + Real-ESRGAN) requires significant RAM/VRAM. On CPU-only environments (like basic HF Spaces), processing a single image may take 30-60 seconds. |
| * **Git LFS:** Image assets in this repository are tracked with Git LFS to keep the repo size manageable. |
|
|
| --- |
|
|
| ## 6. Credits & References |
|
|
| * **Original Paper:** [Towards Robust Blind Face Restoration with Codebook Lookup Transformer (NeurIPS 2022)](https://arxiv.org/abs/2206.11253) |
| * **Authors:** Shangchen Zhou, Kelvin C.K. Chan, Chongyi Li, Chen Change Loy (S-Lab, Nanyang Technological University). |
| * **Original Repository:** [sczhou/CodeFormer](https://github.com/sczhou/CodeFormer) |
|
|