Add README

Files changed (3) hide show

.gitattributes +1 -0
README.md +48 -0
assets/figure1.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,51 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+<h1 align="center">
+  BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
+</h1>
+<div align="center">
+[![arXiv](https://img.shields.io/badge/arXiv%20paper-2510.00438-b31b1b.svg)](https://arxiv.org/pdf/2510.00438)&nbsp;
+[![project page](https://img.shields.io/badge/Project_page-More_visualizations-green)](https://lzy-dot.github.io/BindWeave/)&nbsp;
+<a href="https://huggingface.co/ByteDance/BindWeave"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=orange"></a>
+</div>
+ <p align="center">
+  <a href="https://arxiv.org/abs/2502.11079"><strong>BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration</strong></a>
+</p>
+<div align="center">
+  <p>
+    <a href="https://scholar.google.com/citations?user=WelDcqkAAAAJ&hl=zh-CN">Zhaoyang Li</a><sup> 1,2</sup>,
+    <a href="https://openreview.net/profile?id=~Dongjun_Qian1">Dongjun Qian</a><sup> 2</sup>,
+    <a href="https://scholar.google.com/citations?user=Kp3XAToAAAAJ&hl=zh-CN">Kai Su</a><sup> 2*</sup>,
+    <a href="https://scholar.google.com/citations?user=G6xrfhYAAAAJ&hl=zh-CN">Qishuai Diao</a><sup> 2</sup>,
+    <a href="https://openreview.net/profile?id=~Xiangyang_Xia1">Xiangyang Xia</a><sup> 2</sup>,
+    <a href="https://openreview.net/profile?id=~Chang_Liu71">Chang Liu</a><sup> 2</sup>,
+    <a href="https://scholar.google.com/citations?user=rtO5VmQAAAAJ&hl=zh-CN">Wenfei Yang</a><sup> 1</sup>,
+    <a href="https://scholar.google.com/citations?user=9sCGe-gAAAAJ&hl=en">Tianzhu Zhang</a><sup> 1*</sup>,
+    <a href="https://shallowyuan.github.io/">Zehuan Yuan</a><sup> 2</sup>
+  </p>
+  <p>
+    <small>
+      <sup>1</sup>University of Science and Technology of China <sup>2</sup>ByteDance
+      <br>
+      <sup>*</sup>Corresponding Author
+    </small>
+  </p>
+</div>
+<p align="center">
+<img src="assets/figure1.png" width=95%>
+<p>
+## 📖 Overview
+BindWeave is a unified subject-consistent video generation framework for single- and multi-subject prompts, built on an MLLM-DiT architecture that couples a pretrained multimodal large language model with a diffusion transformer.
+It achieves cross-modal integration via entity grounding and representation alignment, leveraging the MLLM to parse complex prompts and produce subject-aware hidden states that condition the DiT for high-fidelity generation.

assets/figure1.png ADDED Viewed

Git LFS Details

SHA256: 04307aad642913ebbc5ac1f20711db75a866ac8c918a01b45a2f88b4ecc57159
Pointer size: 132 Bytes
Size of remote file: 9.72 MB