Smile_Changer / API_DOCUMENTATION.md
LogicGoInfotechSpaces's picture
API: add FastAPI endpoints, bearer auth, align default true; docs updated
57bfe5c

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Smile Changer API Documentation

Overview

The Smile Changer is a facial attribute editing application built on StyleFeatureEditor that allows users to modify various facial attributes like smile, age, beard, hair style/color, glasses, and makeup using AI-powered image editing.

Table of Contents

  1. API Endpoints
  2. Core Functions
  3. Attribute Mapping
  4. Configuration
  5. Error Handling
  6. Usage Examples
  7. Model Architecture
  8. Dependencies

API Endpoints

Main Application Interface

The application is built using Gradio and provides a web-based interface with the following components:

Input Parameters

Parameter Type Description Default Range
image PIL.Image Input face image - Any valid image format
attribute str Attribute to edit "Smile" See Attribute Mapping
strength float Edit intensity 5.0 Varies by attribute
align_face bool Enable face alignment False True/False
use_bg_mask bool Use background masking False True/False
custom_text_edit str Custom text prompt "" StyleCLIP format

Output

Parameter Type Description
edited_image PIL.Image Edited face image

Core Functions

run_edit(image, attribute, strength, align_face, use_bg_mask, custom_text_edit)

Main editing function that processes the input image and applies the specified attribute modification.

Parameters:

  • image (PIL.Image): Input face image
  • attribute (str): Attribute name from ATTRIBUTE_MAP
  • strength (float): Edit intensity (automatically clipped to valid range)
  • align_face (bool): Whether to align face before editing
  • use_bg_mask (bool): Whether to use background masking
  • custom_text_edit (str): Custom text prompt for StyleCLIP edits

Returns:

  • PIL.Image: Edited image

Process Flow:

  1. Load and initialize the SimpleRunner
  2. Determine editing parameters from attribute selection
  3. Apply strength clipping to valid range
  4. Process image through the editing pipeline
  5. Return edited result

get_runner() -> SimpleRunner

Singleton function that initializes and returns the SimpleRunner instance.

Returns:

  • SimpleRunner: Configured runner instance

Features:

  • Lazy initialization
  • Automatic model weight downloading
  • Error handling and logging

ensure_weights()

Downloads required model weights from Hugging Face if not present locally.

Required Files:

  • sfe_editor_light.pt - Main editor model
  • stylegan2-ffhq-config-f.pt - StyleGAN2 generator
  • e4e_ffhq_encode.pt - Encoder model
  • shape_predictor_68_face_landmarks.dat - Face landmark predictor
  • Additional supporting models

Attribute Mapping

The application supports the following facial attributes:

Face Semantics

Attribute Internal Name Range Description
Smile fs_smiling -10.0 to 10.0 Positive adds smile, negative removes
Age age -10.0 to 10.0 Positive makes older, negative makes younger
Female features gender -10.0 to 7.0 Positive adds femininity

Facial Hair

Attribute Internal Name Range Description
Beard trimmed_beard -30.0 to 30.0 Negative values ADD beard
Mustache/Goatee goatee -7.0 to 7.0 Negative values ADD goatee

Accessories & Cosmetics

Attribute Internal Name Range Description
Glasses fs_glasses -20.0 to 30.0 Positive adds glasses, negative removes
Makeup fs_makeup -10.0 to 15.0 Positive adds makeup, negative removes

Hair Style

Attribute Internal Name Range Description
Curly hair curly_hair 0.0 to 0.12 Adds curly hair texture
Afro afro 0.0 to 0.14 Adds afro hairstyle

Hair Color (Text-based)

Attribute Internal Name Range Description
Orange hair (text) styleclip_global_a face_a face with orange hair_0.18 0.0 to 0.2 Changes hair to orange
Blonde hair (text) styleclip_global_a face_a face with blonde hair_0.18 0.0 to 0.2 Changes hair to blonde

Configuration

Environment Variables

Variable Description Default
CUDA_VISIBLE_DEVICES GPU device selection "" (CPU)
TORCH_CUDA_ARCH_LIST CUDA architecture "8.0"
HF_TOKEN Hugging Face token -
HUGGINGFACE_TOKEN Alternative HF token -

Model Configuration

The application uses the following configuration files:

  • configs/simple_inference.yaml - Main inference configuration
  • pretrained_models/ - Directory containing all model weights

Error Handling

Common Error Scenarios

  1. Missing Model Weights

    • Automatic download from Hugging Face
    • Fallback to CPU if GPU unavailable
  2. Face Detection Failures

    • Multiple detection thresholds attempted
    • Graceful degradation without alignment
  3. Mask Extraction Failures

    • Continues without background masking
    • Logs warnings for debugging
  4. Alignment Failures

    • Falls back to unaligned processing
    • Preserves original image orientation

Logging

The application uses Python's logging module with INFO level by default:

  • Model initialization status
  • Edit process progress
  • Error details and stack traces
  • File download and verification

Usage Examples

Basic Smile Enhancement

from PIL import Image
from app import run_edit

# Load input image
image = Image.open("input.jpg")

# Apply smile enhancement
edited = run_edit(
    image=image,
    attribute="Smile",
    strength=5.0,
    align_face=False,
    use_bg_mask=False,
    custom_text_edit=""
)

# Save result
edited.save("output.jpg")

Custom Text-based Editing

# Add hat using custom text prompt
edited = run_edit(
    image=image,
    attribute="Orange hair (text)",  # Must be text-based attribute
    strength=0.18,
    align_face=True,
    use_bg_mask=True,
    custom_text_edit="styleclip_global_a face_a face with a hat_0.18"
)

Beard Addition

# Add beard (use negative values)
edited = run_edit(
    image=image,
    attribute="Beard",
    strength=-15.0,  # Negative value adds beard
    align_face=False,
    use_bg_mask=False,
    custom_text_edit=""
)

Model Architecture

Core Components

  1. SimpleRunner: Main interface for image editing
  2. FSEInferenceRunner: Handles model inference and editing
  3. LatentEditor: Manages different editing directions
  4. StyleGAN2: Generator for high-quality image synthesis
  5. E4E Encoder: Encodes images to latent space

Editing Methods

  1. InterfaceGAN Directions: Age, smile, gender
  2. StyleSpace Directions: Gender, facial features
  3. StyleCLIP Global Mapper: Text-based editing
  4. DeltaEdit: Advanced attribute manipulation

Processing Pipeline

  1. Input Preprocessing: Image normalization and resizing
  2. Face Alignment: Optional landmark-based alignment
  3. Background Masking: Optional face segmentation
  4. Latent Encoding: Convert image to latent representation
  5. Attribute Editing: Apply desired modifications
  6. Image Synthesis: Generate edited result
  7. Post-processing: Optional unalignment and blending

Dependencies

Core Dependencies

gradio==4.44.0
torch
torchvision
Pillow>=9.5
numpy>=1.23
opencv-python-headless==4.10.0.84

AI/ML Dependencies

omegaconf==2.1.2
einops==0.7.0
timm==1.0.3
clip @ git+https://github.com/openai/CLIP.git

Utility Dependencies

scipy==1.10.1
networkx==3.3
fsspec==2024.3.1
gdown==4.7.1
wandb==0.15.2
pandas==2.2.2
ninja>=1.11

System Dependencies

dlib-binary
spaces>=0.28.3
setuptools>=68
wheel>=0.41

Performance Considerations

Memory Usage

  • Model weights: ~2GB total
  • GPU memory: ~4GB recommended
  • CPU fallback available

Processing Time

  • Initialization: 30-60 seconds
  • Per edit: 5-15 seconds (GPU), 30-60 seconds (CPU)
  • Face alignment: +2-5 seconds
  • Background masking: +3-8 seconds

Optimization Tips

  1. Use GPU when available
  2. Disable alignment for faster processing
  3. Use background masking only when needed
  4. Batch multiple edits when possible

Troubleshooting

Common Issues

  1. "No module named 'piq'"

    • Install missing dependencies: pip install piq
  2. CUDA initialization errors

    • Set CUDA_VISIBLE_DEVICES="" for CPU-only mode
    • Check GPU compatibility
  3. Face detection failures

    • Ensure clear, well-lit face images
    • Try different alignment settings
    • Check image resolution (minimum 256x256)
  4. Model download failures

    • Verify Hugging Face token
    • Check internet connectivity
    • Ensure sufficient disk space

Debug Mode

Enable detailed logging by setting:

import logging
logging.basicConfig(level=logging.DEBUG)

License and Credits

This application is based on the StyleFeatureEditor research project. Please refer to the original repository for licensing information and citations.

Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review error logs
  3. Verify input image quality
  4. Test with different attribute combinations