Image-to-Video
Diffusers
Safetensors
English
Chinese
video generation
conversational video generation
talking human video generation
Instructions to use MeiGen-AI/MeiGen-MultiTalk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use MeiGen-AI/MeiGen-MultiTalk with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("MeiGen-AI/MeiGen-MultiTalk", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,79 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- zh
|
| 6 |
+
tags:
|
| 7 |
+
- video generation
|
| 8 |
+
- conversational video generation
|
| 9 |
+
- talking human video generation
|
| 10 |
+
pipeline_tag: image-to-video
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
# MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
<p align="left">
|
| 19 |
+
<a href="https://meigen-ai.github.io/multi-talk/">
|
| 20 |
+
<img
|
| 21 |
+
src="https://img.shields.io/badge/MultiTalk-Website-0A66C2?logo=safari&logoColor=white" style="display: inline-block; vertical-align: middle;"
|
| 22 |
+
alt="MultiTalk Website"
|
| 23 |
+
/>
|
| 24 |
+
</a>
|
| 25 |
+
<a href="https://arxiv.org/abs/2505.22647">
|
| 26 |
+
<img
|
| 27 |
+
src="https://img.shields.io/badge/MultiTalk-Paper-red?logo=arxiv&logoColor=red" style="display: inline-block; vertical-align: middle;"
|
| 28 |
+
alt="MultiTalk Paper on arXiv"
|
| 29 |
+
/>
|
| 30 |
+
</a>
|
| 31 |
+
<a href="https://github.com/MeiGen-AI/MultiTalk" target="_blank" style="margin: 2px;">
|
| 32 |
+
<img
|
| 33 |
+
alt="Github" src="https://img.shields.io/badge/MultiTalk-Codebase-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"
|
| 34 |
+
alt="MultiTalk Codebase"
|
| 35 |
+
/>
|
| 36 |
+
</a>
|
| 37 |
+
|
| 38 |
+
</p>
|
| 39 |
+
|
| 40 |
+
> We present **MultiTalk**, an open-source audio-driven multi-person conversational video generation model with the state-of-the-art lip synchronization accuracy.
|
| 41 |
+
> Key features:
|
| 42 |
+
> - 💬 Realistic Conversations - Supports single & multi-person generation
|
| 43 |
+
> - 👥 Interactive Character Control - Direct virtual humans via prompts
|
| 44 |
+
> - 🎤 Generalization Performances - Supports the generation of cartoon character and singing
|
| 45 |
+
> - 📺 Resolution Flexibility: 480p & 720p output at arbitrary aspect ratios
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
This repository hosts the model weights for **MultiTalk**. For installation, usage instructions, and further documentation, please visit our [GitHub repository](https://github.com/MeiGen-AI/MultiTalk).
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
## Method
|
| 56 |
+
<p align="left"><img src="https://github.com/MeiGen-AI/MultiTalk/blob/main/assets/pipe.png" width="80%"></p>
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
## Citation
|
| 63 |
+
If you find our work helpful, please cite us.
|
| 64 |
+
|
| 65 |
+
```
|
| 66 |
+
@article{kong2025let,
|
| 67 |
+
title={Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation},
|
| 68 |
+
author={Kong, Zhe and Gao, Feng and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Cai, Xunliang and Chen, Guanying and Luo, Wenhan},
|
| 69 |
+
journal={arXiv preprint arXiv:2505.22647},
|
| 70 |
+
year={2025}
|
| 71 |
+
}
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
## License Agreement
|
| 77 |
+
The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations.
|
| 78 |
+
|
| 79 |
+
|