File size: 2,831 Bytes
0ab3ec8
 
 
 
 
 
 
 
63cb5ef
0ab3ec8
63cb5ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ab3ec8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
license: apache-2.0
language:
- en
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tag: text-to-video
---
<p align="center">
  <img src="assets/icon.png" height=25>
</p>

<h1 align='center'>EchoShot: Multi-Shot Portrait Video Generation</h1>
<p align="center">
    <strong><a href="https://scholar.google.com/citations?hl=en&user=zQnTBEoAAAAJ">Jiahao Wang</a><sup>1</sup></strong>

    <strong><a href="https://scholar.google.com/citations?user=73JaDUQAAAAJ&hl=en&oi=ao">Hualian Sheng</a><sup>2</sup></strong>

    <strong><a href="https://scholar.google.com/citations?user=LMVeRVAAAAAJ&hl=en&oi=ao">Sijia Cai</a><sup>2,&dagger;</sup></strong>

    <strong><a href="https://gr.xjtu.edu.cn/web/zhangwzh123/">Weizhan Zhang</a><sup>1,*</sup></strong><br>
    <strong><a href="https://gr.xjtu.edu.cn/web/yancaixia">Caixia Yan</a><sup>1</sup></strong>

    <strong><a href="">Yachuang Feng</a><sup>2</sup></strong>
    .
    <strong><a href="https://scholar.google.com/citations?user=VQp_ye4AAAAJ&hl=zh-CN&oi=ao">Bing Deng</a><sup>2</sup></strong>
    .
    <strong><a href="https://scholar.google.com/citations?user=T9AzhwcAAAAJ&hl=zh-CN&oi=ao">Jieping Ye</a><sup>2</sup></strong>
    <br>
    <br>
    <sup>1</sup>Xi'an Jiaotong University &nbsp;&nbsp;&nbsp;&nbsp;
    <sup>2</sup>Alibaba Cloud
    <br>
    <br>
        <a href="https://arxiv.org/abs/2506.15838"><img src='https://img.shields.io/badge/+-arXiv-red' alt='Paper PDF'></a>
        <a href="https://johnneywang.github.io/EchoShot-webpage/"><img src='https://img.shields.io/badge/+-Project_Page-blue' alt='Project Page'></a>
        <a href="https://github.com/JoHnneyWang/EchoShot"><img src='https://img.shields.io/badge/+-Github_Page-green' alt='Github Page'></a>
    <br>
</p>

## 馃摑 Intro
This is the official model of EchoShot, which allows users to generate **multiple video shots showing the same person, controlled by customized prompts**. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!
<div align="center">
    <img src="assets/teasor.jpg", width="1200">
</div>


## 馃敂 News
- July 15, 2025: 馃敟 EchoShot-1.3B-preview is now available at [HuggingFace](https://huggingface.co/JonneyWang/EchoShot)!
- July 15, 2025: 馃帀 Release code of inference and training codes. 
- May 25, 2025: We propose [EchoShot](https://johnneywang.github.io/EchoShot-webpage/), a multi-shot portrait video generation model.


## 馃摉 Citation
If you are inspired by our work, please cite our paper.
```bibtex
@article{wang2025echoshot,
  title={EchoShot: Multi-Shot Portrait Video Generation},
  author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping},
  journal={arXiv preprint arXiv:2506.15838},
  year={2025}
}
```