Spaces:

esmailx51
/

codeformer

Running

codeformer / DOCUMENTATION.md

Upload 110 files

adf2fff verified 19 days ago

7.39 kB

	# CodeFormer Face Restoration - Project Documentation

	## 1. Introduction

	CodeFormer is a robust blind face restoration algorithm designed to restore old, degraded, or AI-generated face images. It utilizes a Codebook Lookup Transformer (VQGAN-based) to predict high-quality facial features even from severe degradation, ensuring that the restored faces look natural and faithful to the original identity.

	This project wraps the core CodeFormer research code into a deployable, user-friendly Flask Web Application, containerized with Docker for easy deployment on platforms like Hugging Face Spaces.

	### Key Features
	* Blind Face Restoration: Restores faces from low-quality inputs without knowing the specific degradation details.
	* Background Enhancement: Uses Real-ESRGAN to upscale and enhance the non-face background regions of the image.
	* Face Alignment & Paste-back: Automatically detects faces, aligns them for processing, and seamlessly blends them back into the original image.
	* Adjustable Fidelity: Users can balance between restoration quality (hallucinating details) and identity fidelity (keeping the original look).

	---

	## 2. System Architecture

	The application is built on a Python/PyTorch backend served via Flask.

	### 2.1 Technology Stack
	* Framework: Flask (Python Web Server)
	* Deep Learning: PyTorch, TorchVision
	* Image Processing: OpenCV, NumPy, Pillow
	* Core Libraries: `basicsr` (Basic Super-Restoration), `facelib` (Face detection/utils)
	* Frontend: HTML5, Bootstrap 5, Jinja2 Templates
	* Containerization: Docker (CUDA-enabled)

	### 2.2 Directory Structure
	```
	CodeFormer/
	├── app.py # Main Flask application entry point
	├── Dockerfile # Container configuration
	├── requirements.txt # Python dependencies
	├── basicsr/ # Core AI framework (Super-Resolution tools)
	├── facelib/ # Face detection and alignment utilities
	├── templates/ # HTML Frontend
	│ ├── index.html # Upload interface
	│ └── result.html # Results display
	├── static/ # Static assets (css, js, uploads)
	│ ├── uploads/ # Temporary storage for input images
	│ └── results/ # Temporary storage for processed output
	└── weights/ # Pre-trained model weights (downloaded on startup)
	├── CodeFormer/ # CodeFormer model (.pth)
	├── facelib/ # Detection (RetinaFace) and Parsing models
	└── realesrgan/ # Background upscaler (Real-ESRGAN)
	```

	### 2.3 Logic Flow
	1. Input: User uploads an image via the Web UI.
	2. Pre-processing (`app.py`):
	* Image is saved to `static/uploads`.
	* Parameters (fidelity, upscale factor) are parsed.
	3. Inference Pipeline:
	* Detection: `facelib` detects faces in the image using RetinaFace.
	* Alignment: Faces are cropped and aligned to a standard 512x512 resolution.
	* Restoration: The CodeFormer model processes the aligned faces.
	* Upscaling (Optional): The background is upscaled using Real-ESRGAN.
	* Paste-back: Restored faces are warped back to their original positions and blended.
	4. Output: The final image is saved to `static/results` and displayed to the user.

	---

	## 3. Installation & Deployment

	### 3.1 Docker Deployment (Recommended)
	The project is optimized for Docker.

	Prerequisites: Docker, NVIDIA GPU (optional, but recommended).

	1. Build the Image:
	```bash
	docker build -t codeformer-app .
	```

	2. Run the Container:
	```bash
	# Run on port 7860 (Standard for HF Spaces)
	docker run -it -p 7860:7860 codeformer-app
	```
	Note: To use GPU, add the `--gpus all` flag to the run command.

	### 3.2 Hugging Face Spaces Deployment
	This repository is configured for direct deployment to Hugging Face.

	1. Create a Docker Space on Hugging Face.
	2. Push this entire repository to the Space's Git remote.
	```bash
	git remote add hf git@hf.co:spaces/USERNAME/SPACE_NAME
	git push hf main
	```
	3. The Space will build (approx. 5-10 mins) and launch automatically.

	### 3.3 Local Development
	1. Install Environment:
	```bash
	conda create -n codeformer python=3.8
	conda activate codeformer
	pip install -r requirements.txt
	```
	2. Install Basicsr:
	```bash
	python basicsr/setup.py install
	```
	3. Run App:
	```bash
	python app.py
	```

	---

	## 4. User Guide (Web Interface)

	### 4.1 Interface Controls

	* Input Image: Supports standard formats (JPG, PNG, WEBP). Drag and drop supported.
	* Fidelity Weight (w):
	* Range: 0.0 to 1.0.
	* 0.0 (Better Quality): The model "hallucinates" more details. Results look very sharp and high-quality but may slightly alter the person's identity (look less like the original).
	* 1.0 (Better Identity): The model sticks strictly to the original features. Results are faithful to the original photo but might be blurrier or contain more artifacts.
	* Recommended: 0.5 is a balanced default.
	* Upscale Factor:
	* Scales the final output resolution (1x, 2x, or 4x).
	* Note: Higher scaling requires more VRAM.
	* Enhance Background:
	* If checked, runs Real-ESRGAN on the non-face areas.
	* Recommendation: Keep checked for full-photo restoration. Uncheck if you only care about the face or are running on limited hardware.
	* Upsample Face:
	* If checked, the restored face is also upsampled to match the background resolution.

	### 4.2 Viewing Results
	The result page features an interactive Before/After Slider. Drag the handle left and right to compare the pixels of the original versus the restored image directly.

	---

	## 5. Technical Details

	### 5.1 Model Weights
	The application automatically checks for and downloads the following weights to the `weights/` directory on startup:

	\| Model \| Path \| Description \|
	\| :--- \| :--- \| :--- \|
	\| CodeFormer \| `weights/CodeFormer/codeformer.pth` \| Main restoration model. \|
	\| RetinaFace \| `weights/facelib/detection_Resnet50_Final.pth` \| Face detection. \|
	\| ParseNet \| `weights/facelib/parsing_parsenet.pth` \| Face parsing (segmentation). \|
	\| Real-ESRGAN \| `weights/realesrgan/RealESRGAN_x2plus.pth` \| Background upscaler (x2). \|

	### 5.2 Performance Notes
	* Memory: The full pipeline (CodeFormer + Real-ESRGAN) requires significant RAM/VRAM. On CPU-only environments (like basic HF Spaces), processing a single image may take 30-60 seconds.
	* Git LFS: Image assets in this repository are tracked with Git LFS to keep the repo size manageable.

	---

	## 6. Credits & References

	* Original Paper: [Towards Robust Blind Face Restoration with Codebook Lookup Transformer (NeurIPS 2022)](https://arxiv.org/abs/2206.11253)
	* Authors: Shangchen Zhou, Kelvin C.K. Chan, Chongyi Li, Chen Change Loy (S-Lab, Nanyang Technological University).
	* Original Repository: [sczhou/CodeFormer](https://github.com/sczhou/CodeFormer)