# MeiGen-MultiTalk Demo This is a demo of [MeiGen-MultiTalk](https://huggingface.co/MeiGen-AI/MeiGen-MultiTalk), an audio-driven multi-person conversational video generation model. ## Features - 💬 Generate videos of people talking from still images and audio - 👥 Support for both single-person and multi-person conversations - 🎯 High-quality lip synchronization - 📺 Support for 480p and 720p resolution - ⏱️ Generate videos up to 15 seconds long ## How to Use 1. Upload a reference image (photo of person(s) who will be speaking) 2. Upload one or more audio files: - For single person: Upload one audio file - For conversation: Upload multiple audio files (one per person) 3. Enter a prompt describing the desired video 4. Adjust generation parameters if needed: - Resolution: Video quality (480p or 720p) - Audio CFG: Controls strength of audio influence - Guidance Scale: Controls adherence to prompt - Random Seed: For reproducible results - Max Duration: Video length in seconds 5. Click "Generate Video" and wait for the result ## Tips - Use clear, front-facing photos for best results - Ensure good audio quality without background noise - Keep prompts clear and specific - For multi-person videos, ensure the reference image shows all speakers clearly ## Limitations - Generation can take several minutes - Maximum video duration is 15 seconds - Best results with clear, well-lit reference images - Audio should be clear and without background noise ## Credits This demo uses the MeiGen-MultiTalk model created by MeiGen-AI. If you use this in your work, please cite: ```bibtex @article{kong2025let, title={Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation}, author={Kong, Zhe and Gao, Feng and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Cai, Xunliang and Chen, Guanying and Luo, Wenhan}, journal={arXiv preprint arXiv:2505.22647}, year={2025} }