Alibaba-DAMO-Academy
/

LumosX

 ---
 license: apache-2.0
+language: en
+library_name: pytorch
+pipeline_tag: text-to-video
 tags:
   - video generation
+  - personalized-generation
+  - text-to-video
+  - diffusion
+downloads: true
+---
+# LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
+> Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods lack explicit mechanisms to ensure intra-group consistency. We propose **LumosX**, a framework that advances both data and model design to achieve state-of-the-art performance in fine-grained, identity-consistent, and semantically aligned personalized multi-subject video generation.
+[![arXiv](https://img.shields.io/badge/arXiv-46f333f179-b31b1b.svg)](https://arxiv.org/abs/46f333f179)
+[![OpenReview](https://img.shields.io/badge/OpenReview-paper-blue)](https://openreview.net/forum?id=r5o6PWgzav)
+[![GitHub](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/alibaba-damo-academy/Lumos-Custom/tree/main/LumosX)
+[![Project Page](https://img.shields.io/badge/Project-Page-purple)](https://jiazheng-xing.github.io/lumosx-home/)
 ---
+### 💻 Authors
+<div align="center">
+[Jiazheng Xing](https://jiazheng-xing.github.io/)<sup>1,4,2,\*</sup>, Fei Du<sup>2,3,\*</sup>, [Hangjie Yuan](https://jacobyuan7.github.io/)<sup>2,3,1,\*</sup>, Pengwei Liu<sup>1,2</sup>, Hongbin Xu<sup>4</sup>, Hai Ci<sup>4</sup>, Ruigang Niu<sup>2,3</sup>, Weihua Chen<sup>2,3</sup><sup>†</sup>, Fan Wang<sup>2</sup>, Yong Liu<sup>1</sup><sup>†</sup>
+<sup>1</sup>Zhejiang University, <sup>2</sup>DAMO Academy, Alibaba Group, <sup>3</sup>Hupan Lab, <sup>4</sup>National University of Singapore
+<sup>\*</sup>Equal contributions &nbsp;·&nbsp; <sup>†</sup>Corresponding authors
+Contact: jiazhengxing@zju.edu.cn, kugang.cwh@alibaba-inc.com, yongliu@iipc.zju.edu.cn
+</div>
+<details>
+<summary><strong>📘 Click to view Abstract</strong></summary>
+> Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods lack explicit mechanisms to ensure intra-group consistency. Addressing this gap requires both explicit modeling strategies and face-attribute-aware data resources. We therefore propose **LumosX**, a framework that advances both data and model design.
+> On the data side, a tailored collection pipeline orchestrates captions and visual cues from independent videos, while multimodal large language models (MLLMs) infer and assign subject-specific dependencies. These extracted relational priors impose a finer-grained structure that amplifies the expressive control of personalized video generation and enables the construction of a comprehensive benchmark.
+> On the modeling side, Relational Self-Attention and Relational Cross-Attention intertwine position-aware embeddings with refined attention dynamics to inscribe explicit subject-attribute dependencies, enforcing disciplined intra-group cohesion and amplifying the separation between distinct subject clusters. Comprehensive evaluations on our benchmark demonstrate that LumosX achieves state-of-the-art performance in fine-grained, identity-consistent, and semantically aligned personalized multi-subject video generation.
+</details>
+## 📜 News
+**[2026/1/26]** Accepted by [ICLR 2026](https://iclr.cc/Conferences/2026) !
+**[2026/3/21]** Code is available in [Lumos-Custom / LumosX](https://github.com/alibaba-damo-academy/Lumos-Custom/tree/main/LumosX) !
+## 📎 Citation
+If you find this work useful, please cite:
+```bibtex
+@inproceedings{xinglumosx,
+  title={LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation},
+  author={Xing, Jiazheng and Du, Fei and Yuan, Hangjie and Liu, Pengwei and Xu, Hongbin and Ci, Hai and Niu, Ruigang and Chen, Weihua and Wang, Fan and Liu, Yong},
+  booktitle={The Fourteenth International Conference on Learning Representations}
+}
+```
+## 📣 Disclaimer
+This is the official release channel for LumosX weights.