Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
metadata
title: Coach Pro Lip Sync Tool
emoji: π€
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
hardware: t4-small
suggested_hardware: t4-small
suggested_storage: small
duplicate: true
tags:
- lip-sync
- ai
- video
- audio
- microphone
- tts
- gradio
- ipad
- mobile
π¬ Advanced Lip Sync Tool with Microphone Support
π Features
π€ Microphone Integration
- Real-time Recording: Direct microphone access in browser
- Live Streaming: Real-time audio processing
- Cross-platform: Works on iPad, iPhone, Desktop
- Multiple Sources: Upload files OR record live
π€ AI Models Supported
- Wav2Lip: High accuracy lip synchronization
- MuseTalk: Real-time processing (30fps+)
- SadTalker: Natural expressions and emotions
π± iPad Optimized
- β Touch-friendly interface
- β Safari/Chrome compatible
- β Mobile-responsive design
- β No local installation required
π΅ Audio Features
- Text-to-Speech: Convert text to natural speech
- Multi-language: English, Ψ§Ψ±Ψ―Ω, ΰ€Ήΰ€Ώΰ€ΰ€¦ΰ₯ support
- Voice Selection: Male, Female, Natural voices
- Format Support: WAV, MP3, M4A
π Quick Start
1. Basic Lip Sync
- Upload your video file
- Record audio with microphone OR upload audio file
- Choose AI model (Wav2Lip recommended for beginners)
- Click "Generate Lip Sync"
2. Text-to-Speech
- Upload your video
- Type text in English or Ψ§Ψ±Ψ―Ω
- Select voice type and language
- Generate speech + lip sync automatically
3. Live Recording
- Use real-time microphone monitoring
- Test audio levels and quality
- Perfect for testing before main processing
π Supported Formats
πΉ Video Input
- MP4, AVI, MOV, WebM
- Max size: 2GB
- Max duration: 10 minutes
- Recommended: 720p for faster processing
π΅ Audio Input
- WAV (recommended), MP3, M4A
- Sample rates: 16kHz - 48kHz
- Max recording: 30 seconds
- Min quality: 16-bit
π οΈ Technical Specs
π§ Processing
- Hardware: GPU T4 acceleration
- Speed: 30-120 seconds per minute of video
- Quality: Up to 1080p output
- Concurrent: Multiple users supported
π± Mobile Compatibility
- iOS Safari: β Full support
- Chrome Mobile: β Full support
- iPad: β Optimized interface
- Android: β Compatible
π― Model Comparison
| Feature | Wav2Lip | MuseTalk | SadTalker |
|---|---|---|---|
| Accuracy | βββββ | ββββ | ββββ |
| Speed | βββ | βββββ | ββ |
| Quality | βββββ | ββββ | ββββ |
| Expressions | ββ | βββ | βββββ |
| Real-time | β | β | β |
| Any Identity | β | β | β |
π¨ Use Cases
πΊ Content Creation
- YouTube videos
- Social media content
- Educational materials
- Marketing videos
π¬ Professional
- Film dubbing
- Language localization
- Voice-over replacement
- Commercial production
π± Personal
- Family videos
- Meme creation
- Fun projects
- Learning tools
π§ iPad Setup Guide
π± Step 1: Browser Setup
1. Open Safari or Chrome
2. Navigate to this Space
3. Allow microphone permissions
4. Test with Live Recording tab
π€ Step 2: Microphone Test
1. Go to "Live Recording" tab
2. Tap microphone icon
3. Speak normally
4. Check audio levels
π¬ Step 3: Processing
1. Upload video (MP4 recommended)
2. Record or upload audio
3. Choose quality (720p for speed)
4. Select AI model
5. Process and download
π Troubleshooting
π€ Microphone Issues
- Check browser permissions
- Try refreshing the page
- Use headphones to avoid feedback
- Ensure quiet environment
β‘ Performance
- Use 720p for faster processing
- Limit video to 5 minutes max
- Close other browser tabs
- Use WiFi for best experience
π± iPad Specific
- Safari works better than Chrome
- Enable microphone in Settings > Safari
- Use landscape mode for better view
- Keep device charged during processing
π API Usage
This Space provides API endpoints for developers:
import requests
# Example API call
response = requests.post(
"https://huggingface.co/spaces/[USERNAME]/[SPACE]/api/predict",
json={
"data": [video_file, audio_file, "Wav2Lip"]
}
)
π€ Contributing
We welcome contributions! Areas for improvement:
- Additional AI models
- Better mobile optimization
- More language support
- Performance enhancements
π License
MIT License - Free for personal and commercial use
π Credits
- Wav2Lip: Original research by IIIT Hyderabad
- MuseTalk: TMElyralab implementation
- SadTalker: Xi'an Jiaotong University
- Gradio: Interface framework
- Hugging Face: Hosting platform
π Support
- π¬ Discussions: Use the Community tab
- π Bug Reports: Create an issue
- π§ Contact: [Your contact info]
- π Documentation: See tabs in the app
π Star this Space if you find it useful!
Made with β€οΈ for the AI community