Spaces:

LogicGoInfotechSpaces
/

Smile_Changer

Sleeping

App Files Files Community

Smile_Changer / API_DOCUMENTATION.md

LogicGoInfotechSpaces

API: add FastAPI endpoints, bearer auth, align default true; docs updated

57bfe5c about 2 months ago

preview code

raw

history blame contribute delete

9.72 kB

	# Smile Changer API Documentation

	## Overview

	The Smile Changer is a facial attribute editing application built on StyleFeatureEditor that allows users to modify various facial attributes like smile, age, beard, hair style/color, glasses, and makeup using AI-powered image editing.

	## Table of Contents

	1. [API Endpoints](#api-endpoints)
	2. [Core Functions](#core-functions)
	3. [Attribute Mapping](#attribute-mapping)
	4. [Configuration](#configuration)
	5. [Error Handling](#error-handling)
	6. [Usage Examples](#usage-examples)
	7. [Model Architecture](#model-architecture)
	8. [Dependencies](#dependencies)

	## API Endpoints

	### Main Application Interface

	The application is built using Gradio and provides a web-based interface with the following components:

	#### Input Parameters

	\| Parameter \| Type \| Description \| Default \| Range \|
	\|-----------\|------\|-------------\|---------\|-------\|
	\| `image` \| PIL.Image \| Input face image \| - \| Any valid image format \|
	\| `attribute` \| str \| Attribute to edit \| "Smile" \| See [Attribute Mapping](#attribute-mapping) \|
	\| `strength` \| float \| Edit intensity \| 5.0 \| Varies by attribute \|
	\| `align_face` \| bool \| Enable face alignment \| False \| True/False \|
	\| `use_bg_mask` \| bool \| Use background masking \| False \| True/False \|
	\| `custom_text_edit` \| str \| Custom text prompt \| "" \| StyleCLIP format \|

	#### Output

	\| Parameter \| Type \| Description \|
	\|-----------\|------\|-------------\|
	\| `edited_image` \| PIL.Image \| Edited face image \|

	## Core Functions

	### `run_edit(image, attribute, strength, align_face, use_bg_mask, custom_text_edit)`

	Main editing function that processes the input image and applies the specified attribute modification.

	Parameters:
	- `image` (PIL.Image): Input face image
	- `attribute` (str): Attribute name from ATTRIBUTE_MAP
	- `strength` (float): Edit intensity (automatically clipped to valid range)
	- `align_face` (bool): Whether to align face before editing
	- `use_bg_mask` (bool): Whether to use background masking
	- `custom_text_edit` (str): Custom text prompt for StyleCLIP edits

	Returns:
	- `PIL.Image`: Edited image

	Process Flow:
	1. Load and initialize the SimpleRunner
	2. Determine editing parameters from attribute selection
	3. Apply strength clipping to valid range
	4. Process image through the editing pipeline
	5. Return edited result

	### `get_runner() -> SimpleRunner`

	Singleton function that initializes and returns the SimpleRunner instance.

	Returns:
	- `SimpleRunner`: Configured runner instance

	Features:
	- Lazy initialization
	- Automatic model weight downloading
	- Error handling and logging

	### `ensure_weights()`

	Downloads required model weights from Hugging Face if not present locally.

	Required Files:
	- `sfe_editor_light.pt` - Main editor model
	- `stylegan2-ffhq-config-f.pt` - StyleGAN2 generator
	- `e4e_ffhq_encode.pt` - Encoder model
	- `shape_predictor_68_face_landmarks.dat` - Face landmark predictor
	- Additional supporting models

	## Attribute Mapping

	The application supports the following facial attributes:

	### Face Semantics

	\| Attribute \| Internal Name \| Range \| Description \|
	\|-----------\|---------------\|-------\|-------------\|
	\| Smile \| `fs_smiling` \| -10.0 to 10.0 \| Positive adds smile, negative removes \|
	\| Age \| `age` \| -10.0 to 10.0 \| Positive makes older, negative makes younger \|
	\| Female features \| `gender` \| -10.0 to 7.0 \| Positive adds femininity \|

	### Facial Hair

	\| Attribute \| Internal Name \| Range \| Description \|
	\|-----------\|---------------\|-------\|-------------\|
	\| Beard \| `trimmed_beard` \| -30.0 to 30.0 \| Negative values ADD beard \|
	\| Mustache/Goatee \| `goatee` \| -7.0 to 7.0 \| Negative values ADD goatee \|

	### Accessories & Cosmetics

	\| Attribute \| Internal Name \| Range \| Description \|
	\|-----------\|---------------\|-------\|-------------\|
	\| Glasses \| `fs_glasses` \| -20.0 to 30.0 \| Positive adds glasses, negative removes \|
	\| Makeup \| `fs_makeup` \| -10.0 to 15.0 \| Positive adds makeup, negative removes \|

	### Hair Style

	\| Attribute \| Internal Name \| Range \| Description \|
	\|-----------\|---------------\|-------\|-------------\|
	\| Curly hair \| `curly_hair` \| 0.0 to 0.12 \| Adds curly hair texture \|
	\| Afro \| `afro` \| 0.0 to 0.14 \| Adds afro hairstyle \|

	### Hair Color (Text-based)

	\| Attribute \| Internal Name \| Range \| Description \|
	\|-----------\|---------------\|-------\|-------------\|
	\| Orange hair (text) \| `styleclip_global_a face_a face with orange hair_0.18` \| 0.0 to 0.2 \| Changes hair to orange \|
	\| Blonde hair (text) \| `styleclip_global_a face_a face with blonde hair_0.18` \| 0.0 to 0.2 \| Changes hair to blonde \|

	## Configuration

	### Environment Variables

	\| Variable \| Description \| Default \|
	\|----------\|-------------\|---------\|
	\| `CUDA_VISIBLE_DEVICES` \| GPU device selection \| "" (CPU) \|
	\| `TORCH_CUDA_ARCH_LIST` \| CUDA architecture \| "8.0" \|
	\| `HF_TOKEN` \| Hugging Face token \| - \|
	\| `HUGGINGFACE_TOKEN` \| Alternative HF token \| - \|

	### Model Configuration

	The application uses the following configuration files:
	- `configs/simple_inference.yaml` - Main inference configuration
	- `pretrained_models/` - Directory containing all model weights

	## Error Handling

	### Common Error Scenarios

	1. Missing Model Weights
	- Automatic download from Hugging Face
	- Fallback to CPU if GPU unavailable

	2. Face Detection Failures
	- Multiple detection thresholds attempted
	- Graceful degradation without alignment

	3. Mask Extraction Failures
	- Continues without background masking
	- Logs warnings for debugging

	4. Alignment Failures
	- Falls back to unaligned processing
	- Preserves original image orientation

	### Logging

	The application uses Python's logging module with INFO level by default:
	- Model initialization status
	- Edit process progress
	- Error details and stack traces
	- File download and verification

	## Usage Examples

	### Basic Smile Enhancement

	```python
	from PIL import Image
	from app import run_edit

	# Load input image
	image = Image.open("input.jpg")

	# Apply smile enhancement
	edited = run_edit(
	image=image,
	attribute="Smile",
	strength=5.0,
	align_face=False,
	use_bg_mask=False,
	custom_text_edit=""
	)

	# Save result
	edited.save("output.jpg")
	```

	### Custom Text-based Editing

	```python
	# Add hat using custom text prompt
	edited = run_edit(
	image=image,
	attribute="Orange hair (text)", # Must be text-based attribute
	strength=0.18,
	align_face=True,
	use_bg_mask=True,
	custom_text_edit="styleclip_global_a face_a face with a hat_0.18"
	)
	```

	### Beard Addition

	```python
	# Add beard (use negative values)
	edited = run_edit(
	image=image,
	attribute="Beard",
	strength=-15.0, # Negative value adds beard
	align_face=False,
	use_bg_mask=False,
	custom_text_edit=""
	)
	```

	## Model Architecture

	### Core Components

	1. SimpleRunner: Main interface for image editing
	2. FSEInferenceRunner: Handles model inference and editing
	3. LatentEditor: Manages different editing directions
	4. StyleGAN2: Generator for high-quality image synthesis
	5. E4E Encoder: Encodes images to latent space

	### Editing Methods

	1. InterfaceGAN Directions: Age, smile, gender
	2. StyleSpace Directions: Gender, facial features
	3. StyleCLIP Global Mapper: Text-based editing
	4. DeltaEdit: Advanced attribute manipulation

	### Processing Pipeline

	1. Input Preprocessing: Image normalization and resizing
	2. Face Alignment: Optional landmark-based alignment
	3. Background Masking: Optional face segmentation
	4. Latent Encoding: Convert image to latent representation
	5. Attribute Editing: Apply desired modifications
	6. Image Synthesis: Generate edited result
	7. Post-processing: Optional unalignment and blending

	## Dependencies

	### Core Dependencies

	```
	gradio==4.44.0
	torch
	torchvision
	Pillow>=9.5
	numpy>=1.23
	opencv-python-headless==4.10.0.84
	```

	### AI/ML Dependencies

	```
	omegaconf==2.1.2
	einops==0.7.0
	timm==1.0.3
	clip @ git+https://github.com/openai/CLIP.git
	```

	### Utility Dependencies

	```
	scipy==1.10.1
	networkx==3.3
	fsspec==2024.3.1
	gdown==4.7.1
	wandb==0.15.2
	pandas==2.2.2
	ninja>=1.11
	```

	### System Dependencies

	```
	dlib-binary
	spaces>=0.28.3
	setuptools>=68
	wheel>=0.41
	```

	## Performance Considerations

	### Memory Usage
	- Model weights: ~2GB total
	- GPU memory: ~4GB recommended
	- CPU fallback available

	### Processing Time
	- Initialization: 30-60 seconds
	- Per edit: 5-15 seconds (GPU), 30-60 seconds (CPU)
	- Face alignment: +2-5 seconds
	- Background masking: +3-8 seconds

	### Optimization Tips
	1. Use GPU when available
	2. Disable alignment for faster processing
	3. Use background masking only when needed
	4. Batch multiple edits when possible

	## Troubleshooting

	### Common Issues

	1. "No module named 'piq'"
	- Install missing dependencies: `pip install piq`

	2. CUDA initialization errors
	- Set `CUDA_VISIBLE_DEVICES=""` for CPU-only mode
	- Check GPU compatibility

	3. Face detection failures
	- Ensure clear, well-lit face images
	- Try different alignment settings
	- Check image resolution (minimum 256x256)

	4. Model download failures
	- Verify Hugging Face token
	- Check internet connectivity
	- Ensure sufficient disk space

	### Debug Mode

	Enable detailed logging by setting:
	```python
	import logging
	logging.basicConfig(level=logging.DEBUG)
	```

	## License and Credits

	This application is based on the StyleFeatureEditor research project. Please refer to the original repository for licensing information and citations.

	## Support

	For issues and questions:
	1. Check the troubleshooting section
	2. Review error logs
	3. Verify input image quality
	4. Test with different attribute combinations