Smile_Changer / API_DOCUMENTATION.md
LogicGoInfotechSpaces's picture
API: add FastAPI endpoints, bearer auth, align default true; docs updated
57bfe5c
# Smile Changer API Documentation
## Overview
The Smile Changer is a facial attribute editing application built on StyleFeatureEditor that allows users to modify various facial attributes like smile, age, beard, hair style/color, glasses, and makeup using AI-powered image editing.
## Table of Contents
1. [API Endpoints](#api-endpoints)
2. [Core Functions](#core-functions)
3. [Attribute Mapping](#attribute-mapping)
4. [Configuration](#configuration)
5. [Error Handling](#error-handling)
6. [Usage Examples](#usage-examples)
7. [Model Architecture](#model-architecture)
8. [Dependencies](#dependencies)
## API Endpoints
### Main Application Interface
The application is built using Gradio and provides a web-based interface with the following components:
#### Input Parameters
| Parameter | Type | Description | Default | Range |
|-----------|------|-------------|---------|-------|
| `image` | PIL.Image | Input face image | - | Any valid image format |
| `attribute` | str | Attribute to edit | "Smile" | See [Attribute Mapping](#attribute-mapping) |
| `strength` | float | Edit intensity | 5.0 | Varies by attribute |
| `align_face` | bool | Enable face alignment | False | True/False |
| `use_bg_mask` | bool | Use background masking | False | True/False |
| `custom_text_edit` | str | Custom text prompt | "" | StyleCLIP format |
#### Output
| Parameter | Type | Description |
|-----------|------|-------------|
| `edited_image` | PIL.Image | Edited face image |
## Core Functions
### `run_edit(image, attribute, strength, align_face, use_bg_mask, custom_text_edit)`
Main editing function that processes the input image and applies the specified attribute modification.
**Parameters:**
- `image` (PIL.Image): Input face image
- `attribute` (str): Attribute name from ATTRIBUTE_MAP
- `strength` (float): Edit intensity (automatically clipped to valid range)
- `align_face` (bool): Whether to align face before editing
- `use_bg_mask` (bool): Whether to use background masking
- `custom_text_edit` (str): Custom text prompt for StyleCLIP edits
**Returns:**
- `PIL.Image`: Edited image
**Process Flow:**
1. Load and initialize the SimpleRunner
2. Determine editing parameters from attribute selection
3. Apply strength clipping to valid range
4. Process image through the editing pipeline
5. Return edited result
### `get_runner() -> SimpleRunner`
Singleton function that initializes and returns the SimpleRunner instance.
**Returns:**
- `SimpleRunner`: Configured runner instance
**Features:**
- Lazy initialization
- Automatic model weight downloading
- Error handling and logging
### `ensure_weights()`
Downloads required model weights from Hugging Face if not present locally.
**Required Files:**
- `sfe_editor_light.pt` - Main editor model
- `stylegan2-ffhq-config-f.pt` - StyleGAN2 generator
- `e4e_ffhq_encode.pt` - Encoder model
- `shape_predictor_68_face_landmarks.dat` - Face landmark predictor
- Additional supporting models
## Attribute Mapping
The application supports the following facial attributes:
### Face Semantics
| Attribute | Internal Name | Range | Description |
|-----------|---------------|-------|-------------|
| Smile | `fs_smiling` | -10.0 to 10.0 | Positive adds smile, negative removes |
| Age | `age` | -10.0 to 10.0 | Positive makes older, negative makes younger |
| Female features | `gender` | -10.0 to 7.0 | Positive adds femininity |
### Facial Hair
| Attribute | Internal Name | Range | Description |
|-----------|---------------|-------|-------------|
| Beard | `trimmed_beard` | -30.0 to 30.0 | **Negative values ADD beard** |
| Mustache/Goatee | `goatee` | -7.0 to 7.0 | **Negative values ADD goatee** |
### Accessories & Cosmetics
| Attribute | Internal Name | Range | Description |
|-----------|---------------|-------|-------------|
| Glasses | `fs_glasses` | -20.0 to 30.0 | Positive adds glasses, negative removes |
| Makeup | `fs_makeup` | -10.0 to 15.0 | Positive adds makeup, negative removes |
### Hair Style
| Attribute | Internal Name | Range | Description |
|-----------|---------------|-------|-------------|
| Curly hair | `curly_hair` | 0.0 to 0.12 | Adds curly hair texture |
| Afro | `afro` | 0.0 to 0.14 | Adds afro hairstyle |
### Hair Color (Text-based)
| Attribute | Internal Name | Range | Description |
|-----------|---------------|-------|-------------|
| Orange hair (text) | `styleclip_global_a face_a face with orange hair_0.18` | 0.0 to 0.2 | Changes hair to orange |
| Blonde hair (text) | `styleclip_global_a face_a face with blonde hair_0.18` | 0.0 to 0.2 | Changes hair to blonde |
## Configuration
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `CUDA_VISIBLE_DEVICES` | GPU device selection | "" (CPU) |
| `TORCH_CUDA_ARCH_LIST` | CUDA architecture | "8.0" |
| `HF_TOKEN` | Hugging Face token | - |
| `HUGGINGFACE_TOKEN` | Alternative HF token | - |
### Model Configuration
The application uses the following configuration files:
- `configs/simple_inference.yaml` - Main inference configuration
- `pretrained_models/` - Directory containing all model weights
## Error Handling
### Common Error Scenarios
1. **Missing Model Weights**
- Automatic download from Hugging Face
- Fallback to CPU if GPU unavailable
2. **Face Detection Failures**
- Multiple detection thresholds attempted
- Graceful degradation without alignment
3. **Mask Extraction Failures**
- Continues without background masking
- Logs warnings for debugging
4. **Alignment Failures**
- Falls back to unaligned processing
- Preserves original image orientation
### Logging
The application uses Python's logging module with INFO level by default:
- Model initialization status
- Edit process progress
- Error details and stack traces
- File download and verification
## Usage Examples
### Basic Smile Enhancement
```python
from PIL import Image
from app import run_edit
# Load input image
image = Image.open("input.jpg")
# Apply smile enhancement
edited = run_edit(
image=image,
attribute="Smile",
strength=5.0,
align_face=False,
use_bg_mask=False,
custom_text_edit=""
)
# Save result
edited.save("output.jpg")
```
### Custom Text-based Editing
```python
# Add hat using custom text prompt
edited = run_edit(
image=image,
attribute="Orange hair (text)", # Must be text-based attribute
strength=0.18,
align_face=True,
use_bg_mask=True,
custom_text_edit="styleclip_global_a face_a face with a hat_0.18"
)
```
### Beard Addition
```python
# Add beard (use negative values)
edited = run_edit(
image=image,
attribute="Beard",
strength=-15.0, # Negative value adds beard
align_face=False,
use_bg_mask=False,
custom_text_edit=""
)
```
## Model Architecture
### Core Components
1. **SimpleRunner**: Main interface for image editing
2. **FSEInferenceRunner**: Handles model inference and editing
3. **LatentEditor**: Manages different editing directions
4. **StyleGAN2**: Generator for high-quality image synthesis
5. **E4E Encoder**: Encodes images to latent space
### Editing Methods
1. **InterfaceGAN Directions**: Age, smile, gender
2. **StyleSpace Directions**: Gender, facial features
3. **StyleCLIP Global Mapper**: Text-based editing
4. **DeltaEdit**: Advanced attribute manipulation
### Processing Pipeline
1. **Input Preprocessing**: Image normalization and resizing
2. **Face Alignment**: Optional landmark-based alignment
3. **Background Masking**: Optional face segmentation
4. **Latent Encoding**: Convert image to latent representation
5. **Attribute Editing**: Apply desired modifications
6. **Image Synthesis**: Generate edited result
7. **Post-processing**: Optional unalignment and blending
## Dependencies
### Core Dependencies
```
gradio==4.44.0
torch
torchvision
Pillow>=9.5
numpy>=1.23
opencv-python-headless==4.10.0.84
```
### AI/ML Dependencies
```
omegaconf==2.1.2
einops==0.7.0
timm==1.0.3
clip @ git+https://github.com/openai/CLIP.git
```
### Utility Dependencies
```
scipy==1.10.1
networkx==3.3
fsspec==2024.3.1
gdown==4.7.1
wandb==0.15.2
pandas==2.2.2
ninja>=1.11
```
### System Dependencies
```
dlib-binary
spaces>=0.28.3
setuptools>=68
wheel>=0.41
```
## Performance Considerations
### Memory Usage
- Model weights: ~2GB total
- GPU memory: ~4GB recommended
- CPU fallback available
### Processing Time
- Initialization: 30-60 seconds
- Per edit: 5-15 seconds (GPU), 30-60 seconds (CPU)
- Face alignment: +2-5 seconds
- Background masking: +3-8 seconds
### Optimization Tips
1. Use GPU when available
2. Disable alignment for faster processing
3. Use background masking only when needed
4. Batch multiple edits when possible
## Troubleshooting
### Common Issues
1. **"No module named 'piq'"**
- Install missing dependencies: `pip install piq`
2. **CUDA initialization errors**
- Set `CUDA_VISIBLE_DEVICES=""` for CPU-only mode
- Check GPU compatibility
3. **Face detection failures**
- Ensure clear, well-lit face images
- Try different alignment settings
- Check image resolution (minimum 256x256)
4. **Model download failures**
- Verify Hugging Face token
- Check internet connectivity
- Ensure sufficient disk space
### Debug Mode
Enable detailed logging by setting:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## License and Credits
This application is based on the StyleFeatureEditor research project. Please refer to the original repository for licensing information and citations.
## Support
For issues and questions:
1. Check the troubleshooting section
2. Review error logs
3. Verify input image quality
4. Test with different attribute combinations