Image-to-Video
Diffusers
Safetensors
English
lizhaoyang commited on
Commit
5549bd5
·
1 Parent(s): 82e6abb

Add README

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +48 -0
  3. assets/figure1.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,51 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ <h1 align="center">
6
+ BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
7
+ </h1>
8
+
9
+
10
+ <div align="center">
11
+
12
+ [![arXiv](https://img.shields.io/badge/arXiv%20paper-2510.00438-b31b1b.svg)](https://arxiv.org/pdf/2510.00438)&nbsp;
13
+ [![project page](https://img.shields.io/badge/Project_page-More_visualizations-green)](https://lzy-dot.github.io/BindWeave/)&nbsp;
14
+ <a href="https://huggingface.co/ByteDance/BindWeave"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=orange"></a>
15
+ </div>
16
+
17
+
18
+ <p align="center">
19
+ <a href="https://arxiv.org/abs/2502.11079"><strong>BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration</strong></a>
20
+ </p>
21
+
22
+ <div align="center">
23
+ <p>
24
+ <a href="https://scholar.google.com/citations?user=WelDcqkAAAAJ&hl=zh-CN">Zhaoyang Li</a><sup> 1,2</sup>,
25
+ <a href="https://openreview.net/profile?id=~Dongjun_Qian1">Dongjun Qian</a><sup> 2</sup>,
26
+ <a href="https://scholar.google.com/citations?user=Kp3XAToAAAAJ&hl=zh-CN">Kai Su</a><sup> 2*</sup>,
27
+ <a href="https://scholar.google.com/citations?user=G6xrfhYAAAAJ&hl=zh-CN">Qishuai Diao</a><sup> 2</sup>,
28
+ <a href="https://openreview.net/profile?id=~Xiangyang_Xia1">Xiangyang Xia</a><sup> 2</sup>,
29
+ <a href="https://openreview.net/profile?id=~Chang_Liu71">Chang Liu</a><sup> 2</sup>,
30
+ <a href="https://scholar.google.com/citations?user=rtO5VmQAAAAJ&hl=zh-CN">Wenfei Yang</a><sup> 1</sup>,
31
+ <a href="https://scholar.google.com/citations?user=9sCGe-gAAAAJ&hl=en">Tianzhu Zhang</a><sup> 1*</sup>,
32
+ <a href="https://shallowyuan.github.io/">Zehuan Yuan</a><sup> 2</sup>
33
+ </p>
34
+ <p>
35
+ <small>
36
+ <sup>1</sup>University of Science and Technology of China <sup>2</sup>ByteDance
37
+ <br>
38
+ <sup>*</sup>Corresponding Author
39
+ </small>
40
+ </p>
41
+ </div>
42
+
43
+
44
+ <p align="center">
45
+ <img src="assets/figure1.png" width=95%>
46
+ <p>
47
+
48
+
49
+ ## 📖 Overview
50
+ BindWeave is a unified subject-consistent video generation framework for single- and multi-subject prompts, built on an MLLM-DiT architecture that couples a pretrained multimodal large language model with a diffusion transformer.
51
+ It achieves cross-modal integration via entity grounding and representation alignment, leveraging the MLLM to parse complex prompts and produce subject-aware hidden states that condition the DiT for high-fidelity generation.
assets/figure1.png ADDED

Git LFS Details

  • SHA256: 04307aad642913ebbc5ac1f20711db75a866ac8c918a01b45a2f88b4ecc57159
  • Pointer size: 132 Bytes
  • Size of remote file: 9.72 MB