FaceFusion-NextTech-2 / UI_IMPROVEMENTS_GUIDE.md
DegMaTsu
Initial commit FaceFusion-NextTech-2
61cde45

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

FaceFusion UI - Complete Feature Guide & Tips

This comprehensive guide explains every section and option in the FaceFusion UI to help you achieve the best results.


📋 Table of Contents

  1. Main Workflow
  2. Input Section
  3. Processors
  4. Face Detection & Selection
  5. Face Masking
  6. Output Settings
  7. Execution Settings
  8. Memory Management
  9. Tips for Best Results

Main Workflow

Basic Steps for Face Swapping

  1. Upload Source → The face you want to apply
  2. Upload Target → The image/video to modify
  3. Select Processors → face_swapper + face_enhancer for best quality
  4. Configure Settings → Adjust quality and options
  5. Preview → Check a frame before processing
  6. Start Processing → Generate final output

Input Section

SOURCE

Purpose: Upload the face image or audio file you want to apply to the target.

Supported Files:

  • Images: For face swapping (JPG, PNG, etc.)
  • Audio: For lip syncing (MP3, WAV, etc.)

Tips:

  • Use high-quality, well-lit images for best face swap results
  • Source face should be frontal or similar angle to target
  • Clear facial features produce better swaps

TARGET

Purpose: Upload the base image or video that will be modified.

Supported Files:

  • Images: Single image face swap
  • Videos: Video face swap/lip sync

Tips:

  • Higher resolution = better quality but slower processing
  • Good lighting on faces improves detection and swap quality
  • Videos with stable faces work better than highly dynamic scenes

OUTPUT PATH

Purpose: Specify where the processed result will be saved.

Tips:

  • Use descriptive filenames to organize your outputs
  • Default saves to temp directory - specify custom path for permanent storage

Processors

PROCESSORS SELECTION

Select one or more AI processors to apply to your content:

face_swapper ⭐ (Recommended)

  • Swaps faces from source to target
  • Best Models: inswapper_128, blendswap_256
  • Pixel Boost: Use 1024x1024 for maximum quality
  • Higher resolution = better detail but slower processing

face_enhancer ⭐ (Recommended)

  • Improves face quality and details after swapping
  • Best Models: gfpgan_1.4, restoreformer_plus_plus
  • Blend: 80-100 for strong enhancement
  • Weight: Adjust for different model variants
  • Use together with face_swapper for professional results

lip_syncer

  • Synchronizes lips to audio file
  • Requirements: Source audio file must be uploaded
  • Best Model: wav2lip_gan_96 for quality
  • Weight: 1.0 for full sync, lower to blend with original

age_modifier

  • Makes faces younger or older
  • Direction: Negative = younger, Positive = older
  • Range: -100 (very young) to +100 (very old)

expression_restorer

  • Restores target's original facial expressions
  • Factor: 100 = full target expression, 0 = source expression
  • Useful to maintain natural emotions after face swap

frame_enhancer

  • Upscales entire frame (not just face)
  • Models: real_esrgan_x4 (4x upscale), ultra_sharp_x4 (sharper)
  • Use for low-resolution videos
  • Very slow - use only when needed

frame_colorizer

  • Colorizes black & white videos/images
  • Multiple artistic styles available

face_editor

  • Manually adjust facial features
  • Control eyes, mouth, head rotation, expressions
  • Advanced feature for fine-tuning

face_debugger

  • Shows detection boxes, landmarks, scores
  • Useful for troubleshooting detection issues

Face Detection & Selection

FACE DETECTOR

Purpose: Detects faces in images/videos for processing.

Face Detector Model

  • yolo_face: Recommended - best accuracy and speed
  • retinaface: Good alternative

Face Detector Size

  • 640x640: Balanced speed and accuracy (recommended)
  • 320x320: Faster but may miss faces
  • 1280x1280: Best accuracy but slower

Face Detector Angles

  • Enable to detect rotated/tilted faces
  • More angles = better detection but slower
  • Use when faces aren't upright

Face Detector Score

  • Confidence threshold (0-1)
  • 0.5: Standard - good balance
  • Higher = stricter detection, fewer false positives
  • Lower = detect more faces but more false positives

FACE LANDMARKER

Purpose: Detects facial landmarks (eyes, nose, mouth) for accurate alignment.

Face Landmarker Model

  • Detects 5 or 68 facial points
  • Essential for proper face alignment and swapping

Face Landmarker Score

  • Confidence threshold (0-1)
  • 0.5: Generally works well
  • Higher = more accurate landmark detection required

FACE SELECTOR MODE

Purpose: Choose which faces to process in the target.

Modes:

  • One: Process first detected face only
  • Many: Process all detected faces
  • Reference: Track specific face across video frames (best for videos)
  • Age/Gender/Race filters: Target specific demographics

Reference Face Distance

  • Similarity threshold for reference tracking
  • Lower = stricter matching (same person)
  • Higher = more lenient matching

Tips:

  • Use Reference mode for videos with multiple people
  • Use One for single-person content
  • Use filters to target specific faces in multi-person scenes

Face Masking

PURPOSE

Control which parts of the face are swapped and how they blend.

Face Mask Types

Box

  • Simple rectangular mask around face
  • Blur: Controls edge softness (0.3-0.5 recommended)
  • Padding: Expand mask in each direction (top, right, bottom, left)
  • Fast and simple

Occlusion

  • Avoids occluded areas (glasses, hands, hair)
  • Uses face occluder model
  • More natural when face is partially covered

Region

  • Masks specific facial regions
  • Uses face parser model
  • Select regions: eyes, nose, mouth, skin, etc.

Area

  • Masks by facial areas
  • Combine multiple for custom masking

Tips:

  • Combine mask types for best results
  • Increase blur for smoother blending
  • Adjust padding if face edges are visible

Output Settings

IMAGE OUTPUT

Output Image Quality (0-100)

  • JPEG compression quality
  • 90-95: Recommended for high quality
  • 100: Maximum quality (larger file)
  • 70-80: Good quality, smaller file

Output Image Resolution

  • Can upscale or downscale from original
  • Match source resolution for best quality
  • Upscaling beyond 2x may look artificial

VIDEO OUTPUT

Output Video Encoder

  • libx264: Widely compatible, good quality
  • libx265/hevc: Better compression, smaller files
  • h264_nvenc: GPU-accelerated (NVIDIA only)
  • copy: Preserve original encoding

Output Video Preset

  • ultrafast: Quick but large file
  • fast/medium: Balanced
  • slow/slower: Best quality and compression (recommended)
  • veryslow: Maximum quality, very slow encoding

Output Video Quality (0-100)

  • 90-95: Recommended for professional results
  • 80-85: Good quality, reasonable file size
  • Higher = better visual quality, larger files

Output Video Resolution

  • Can upscale or downscale
  • Higher resolution requires more processing time
  • Match original for best quality/performance ratio

Output Video FPS

  • 24: Cinematic look
  • 30: Standard video
  • 60: Smooth motion
  • Match original video FPS for best results

AUDIO OUTPUT (for videos)

Output Audio Encoder

  • aac: Widely compatible, good quality (recommended)
  • libmp3lame: MP3 format
  • copy: Preserve original audio

Output Audio Quality (0-100)

  • 80-90: CD quality
  • 100: Lossless
  • Higher = better sound, larger file

Output Audio Volume (0-200%)

  • 100: Original volume
  • <100: Quieter
  • >100: Louder (may cause distortion)

Execution Settings

EXECUTION PROVIDERS

Purpose: Choose hardware acceleration for processing.

Options:

  • CUDAExecutionProvider: NVIDIA GPU acceleration (fastest)
  • CoreMLExecutionProvider: Apple Silicon acceleration
  • CPUExecutionProvider: CPU only (slowest but always available)

Tips:

  • Use GPU providers when available for 10-50x speedup
  • CPU is very slow but works on any system
  • Some models require specific providers

EXECUTION THREAD COUNT

Purpose: Number of parallel processing threads.

Recommendations:

  • Set to your CPU core count for optimal performance
  • Higher = faster but uses more CPU/GPU
  • Lower if system becomes unresponsive

EXECUTION QUEUE COUNT

Purpose: Frames each thread processes before returning.

Recommendations:

  • 1-2: Recommended for most cases
  • Higher = better GPU utilization but more VRAM needed
  • Lower = less memory usage

Memory Management

VIDEO MEMORY STRATEGY

Purpose: Balance processing speed vs VRAM usage.

Options:

  • Strict: Low memory usage, slower processing
  • Moderate: Balanced (recommended)
  • Tolerant: Faster but uses more VRAM

Tips:

  • Use Strict if you get out-of-memory errors
  • Use Tolerant if you have high-end GPU (12GB+ VRAM)

SYSTEM MEMORY LIMIT

Purpose: Limit RAM usage during processing.

  • 0: No limit
  • Set value (in GB) to prevent system crashes
  • Useful for systems with limited RAM

Tips for Best Results

🌟 Quality Settings (Best Quality)

Processors: face_swapper + face_enhancer
Face Swapper Model: inswapper_128
Pixel Boost: 1024x1024
Face Enhancer Model: gfpgan_1.4
Face Enhancer Blend: 80-100
Output Image/Video Quality: 90-95
Video Preset: slow or slower

⚡ Speed Settings (Faster Processing)

Processors: face_swapper only
Face Swapper Model: inswapper_128
Pixel Boost: 512x512 or 768x768
Skip face_enhancer
Output Quality: 80-85
Video Preset: medium or fast
Execution Threads: Max CPU cores

🎯 Troubleshooting

Face Not Detected

  • Check face detector score (try lowering to 0.3)
  • Enable more detector angles
  • Increase detector size to 1280x1280
  • Ensure face is visible and well-lit

Poor Swap Quality

  • Increase pixel boost to 1024x1024
  • Add face_enhancer processor
  • Use higher output quality (90-95)
  • Ensure source and target faces are similar angles

Out of Memory Error

  • Lower pixel boost to 512x512 or 768x768
  • Set video memory strategy to "strict"
  • Reduce execution queue count to 1
  • Lower output resolution
  • Process shorter video segments using trim frame

Slow Processing

  • Use GPU execution provider (CUDA/CoreML)
  • Reduce pixel boost
  • Skip face_enhancer for faster processing
  • Lower execution thread count
  • Use faster video preset (medium/fast)

Unnatural Blending

  • Increase face mask blur (0.4-0.6)
  • Adjust face mask padding
  • Enable occlusion mask type
  • Lower face enhancer blend

Workflow Examples

Example 1: High-Quality Photo Face Swap

  1. Upload high-resolution source face image
  2. Upload target photo
  3. Select: face_swapper + face_enhancer
  4. Settings:
    • Face Swapper: inswapper_128, 1024x1024
    • Face Enhancer: gfpgan_1.4, blend 90
    • Output Quality: 95
  5. Preview result
  6. Process

Example 2: Video Face Swap (Multiple People)

  1. Upload source face
  2. Upload target video
  3. Select: face_swapper + face_enhancer
  4. Face Selector: Reference mode
  5. Click reference face in gallery
  6. Settings:
    • Pixel boost: 1024x1024
    • Video quality: 90
    • Preset: slow
  7. Use trim frame to process test segment first
  8. Process full video

Example 3: Lip Sync Video

  1. Upload source audio (speech/song)
  2. Upload target video
  3. Select: lip_syncer + face_swapper (optional)
  4. Settings:
    • Lip Syncer: wav2lip_gan_96
    • Weight: 1.0
  5. Process

Summary Table

Feature Recommended Setting Purpose
Face Swapper Model inswapper_128 Best quality swapping
Pixel Boost 1024x1024 Maximum detail
Face Enhancer gfpgan_1.4, blend 80 Improve quality
Output Quality 90-95 Near-lossless
Video Preset slow/slower Best compression
Execution Provider CUDA/CoreML GPU acceleration
Face Selector Reference (videos) Track specific person
Face Mask Blur 0.3-0.5 Natural blending

Last Updated: October 6, 2025

For more information, visit the official FaceFusion documentation.