_my_lip_sync_tool_ / README.md
Nuzhatwa's picture
Update README.md
2b87f87 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: Coach Pro Lip Sync Tool
emoji: 🎀
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
hardware: t4-small
suggested_hardware: t4-small
suggested_storage: small
duplicate: true
tags:
  - lip-sync
  - ai
  - video
  - audio
  - microphone
  - tts
  - gradio
  - ipad
  - mobile

🎬 Advanced Lip Sync Tool with Microphone Support

🌟 Features

🎀 Microphone Integration

  • Real-time Recording: Direct microphone access in browser
  • Live Streaming: Real-time audio processing
  • Cross-platform: Works on iPad, iPhone, Desktop
  • Multiple Sources: Upload files OR record live

πŸ€– AI Models Supported

  • Wav2Lip: High accuracy lip synchronization
  • MuseTalk: Real-time processing (30fps+)
  • SadTalker: Natural expressions and emotions

πŸ“± iPad Optimized

  • βœ… Touch-friendly interface
  • βœ… Safari/Chrome compatible
  • βœ… Mobile-responsive design
  • βœ… No local installation required

🎡 Audio Features

  • Text-to-Speech: Convert text to natural speech
  • Multi-language: English, اردو, ΰ€Ήΰ€Ώΰ€‚ΰ€¦ΰ₯€ support
  • Voice Selection: Male, Female, Natural voices
  • Format Support: WAV, MP3, M4A

πŸš€ Quick Start

1. Basic Lip Sync

  • Upload your video file
  • Record audio with microphone OR upload audio file
  • Choose AI model (Wav2Lip recommended for beginners)
  • Click "Generate Lip Sync"

2. Text-to-Speech

  • Upload your video
  • Type text in English or اردو
  • Select voice type and language
  • Generate speech + lip sync automatically

3. Live Recording

  • Use real-time microphone monitoring
  • Test audio levels and quality
  • Perfect for testing before main processing

πŸ“‹ Supported Formats

πŸ“Ή Video Input

  • MP4, AVI, MOV, WebM
  • Max size: 2GB
  • Max duration: 10 minutes
  • Recommended: 720p for faster processing

🎡 Audio Input

  • WAV (recommended), MP3, M4A
  • Sample rates: 16kHz - 48kHz
  • Max recording: 30 seconds
  • Min quality: 16-bit

πŸ› οΈ Technical Specs

πŸ”§ Processing

  • Hardware: GPU T4 acceleration
  • Speed: 30-120 seconds per minute of video
  • Quality: Up to 1080p output
  • Concurrent: Multiple users supported

πŸ“± Mobile Compatibility

  • iOS Safari: βœ… Full support
  • Chrome Mobile: βœ… Full support
  • iPad: βœ… Optimized interface
  • Android: βœ… Compatible

🎯 Model Comparison

Feature Wav2Lip MuseTalk SadTalker
Accuracy ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐
Speed ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐
Quality ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐
Expressions ⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐⭐
Real-time ❌ βœ… ❌
Any Identity βœ… βœ… βœ…

🎨 Use Cases

πŸ“Ί Content Creation

  • YouTube videos
  • Social media content
  • Educational materials
  • Marketing videos

🎬 Professional

  • Film dubbing
  • Language localization
  • Voice-over replacement
  • Commercial production

πŸ“± Personal

  • Family videos
  • Meme creation
  • Fun projects
  • Learning tools

πŸ”§ iPad Setup Guide

πŸ“± Step 1: Browser Setup

1. Open Safari or Chrome
2. Navigate to this Space
3. Allow microphone permissions
4. Test with Live Recording tab

🎀 Step 2: Microphone Test

1. Go to "Live Recording" tab
2. Tap microphone icon
3. Speak normally
4. Check audio levels

🎬 Step 3: Processing

1. Upload video (MP4 recommended)
2. Record or upload audio
3. Choose quality (720p for speed)
4. Select AI model
5. Process and download

πŸ› Troubleshooting

🎀 Microphone Issues

  • Check browser permissions
  • Try refreshing the page
  • Use headphones to avoid feedback
  • Ensure quiet environment

⚑ Performance

  • Use 720p for faster processing
  • Limit video to 5 minutes max
  • Close other browser tabs
  • Use WiFi for best experience

πŸ“± iPad Specific

  • Safari works better than Chrome
  • Enable microphone in Settings > Safari
  • Use landscape mode for better view
  • Keep device charged during processing

πŸ“š API Usage

This Space provides API endpoints for developers:

import requests

# Example API call
response = requests.post(
    "https://huggingface.co/spaces/[USERNAME]/[SPACE]/api/predict",
    json={
        "data": [video_file, audio_file, "Wav2Lip"]
    }
)

🀝 Contributing

We welcome contributions! Areas for improvement:

  • Additional AI models
  • Better mobile optimization
  • More language support
  • Performance enhancements

πŸ“„ License

MIT License - Free for personal and commercial use

πŸ™ Credits

  • Wav2Lip: Original research by IIIT Hyderabad
  • MuseTalk: TMElyralab implementation
  • SadTalker: Xi'an Jiaotong University
  • Gradio: Interface framework
  • Hugging Face: Hosting platform

πŸ“ž Support

  • πŸ’¬ Discussions: Use the Community tab
  • πŸ› Bug Reports: Create an issue
  • πŸ“§ Contact: [Your contact info]
  • πŸ“– Documentation: See tabs in the app

🌟 Star this Space if you find it useful!

Made with ❀️ for the AI community