Files changed (1) hide show
  1. README.md +68 -1
README.md CHANGED
@@ -1,6 +1,73 @@
1
  ---
2
  license: apache-2.0
3
- downloads: true
 
 
4
  tags:
5
  - video generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language: en
4
+ library_name: pytorch
5
+ pipeline_tag: text-to-video
6
  tags:
7
  - video generation
8
+ - personalized-generation
9
+ - text-to-video
10
+ - diffusion
11
+ downloads: true
12
+ ---
13
+
14
+ # LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
15
+
16
+ > Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods lack explicit mechanisms to ensure intra-group consistency. We propose **LumosX**, a framework that advances both data and model design to achieve state-of-the-art performance in fine-grained, identity-consistent, and semantically aligned personalized multi-subject video generation.
17
+
18
+ [![arXiv](https://img.shields.io/badge/arXiv-46f333f179-b31b1b.svg)](https://arxiv.org/abs/46f333f179)
19
+ [![OpenReview](https://img.shields.io/badge/OpenReview-paper-blue)](https://openreview.net/forum?id=r5o6PWgzav)
20
+ [![GitHub](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/alibaba-damo-academy/Lumos-Custom/tree/main/LumosX)
21
+ [![Project Page](https://img.shields.io/badge/Project-Page-purple)](https://jiazheng-xing.github.io/lumosx-home/)
22
+
23
  ---
24
+
25
+ ### ๐Ÿ’ป Authors
26
+
27
+ <div align="center">
28
+
29
+ [Jiazheng Xing](https://jiazheng-xing.github.io/)<sup>1,4,2,\*</sup>, Fei Du<sup>2,3,\*</sup>, [Hangjie Yuan](https://jacobyuan7.github.io/)<sup>2,3,1,\*</sup>, Pengwei Liu<sup>1,2</sup>, Hongbin Xu<sup>4</sup>, Hai Ci<sup>4</sup>, Ruigang Niu<sup>2,3</sup>, Weihua Chen<sup>2,3</sup><sup>โ€ </sup>, Fan Wang<sup>2</sup>, Yong Liu<sup>1</sup><sup>โ€ </sup>
30
+
31
+ <sup>1</sup>Zhejiang University, <sup>2</sup>DAMO Academy, Alibaba Group, <sup>3</sup>Hupan Lab, <sup>4</sup>National University of Singapore
32
+
33
+ <sup>\*</sup>Equal contributions &nbsp;ยท&nbsp; <sup>โ€ </sup>Corresponding authors
34
+
35
+ Contact: jiazhengxing@zju.edu.cn, kugang.cwh@alibaba-inc.com, yongliu@iipc.zju.edu.cn
36
+
37
+ </div>
38
+
39
+ <details>
40
+ <summary><strong>๐Ÿ“˜ Click to view Abstract</strong></summary>
41
+
42
+ > Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods lack explicit mechanisms to ensure intra-group consistency. Addressing this gap requires both explicit modeling strategies and face-attribute-aware data resources. We therefore propose **LumosX**, a framework that advances both data and model design.
43
+
44
+ > On the data side, a tailored collection pipeline orchestrates captions and visual cues from independent videos, while multimodal large language models (MLLMs) infer and assign subject-specific dependencies. These extracted relational priors impose a finer-grained structure that amplifies the expressive control of personalized video generation and enables the construction of a comprehensive benchmark.
45
+
46
+ > On the modeling side, Relational Self-Attention and Relational Cross-Attention intertwine position-aware embeddings with refined attention dynamics to inscribe explicit subject-attribute dependencies, enforcing disciplined intra-group cohesion and amplifying the separation between distinct subject clusters. Comprehensive evaluations on our benchmark demonstrate that LumosX achieves state-of-the-art performance in fine-grained, identity-consistent, and semantically aligned personalized multi-subject video generation.
47
+
48
+ </details>
49
+
50
+ ## ๐Ÿ“œ News
51
+
52
+ **[2026/1/26]** Accepted by [ICLR 2026](https://iclr.cc/Conferences/2026) !
53
+
54
+ **[2026/3/21]** Code is available in [Lumos-Custom / LumosX](https://github.com/alibaba-damo-academy/Lumos-Custom/tree/main/LumosX) !
55
+
56
+
57
+
58
+ ## ๐Ÿ“Ž Citation
59
+
60
+ If you find this work useful, please cite:
61
+
62
+ ```bibtex
63
+ @inproceedings{xinglumosx,
64
+ title={LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation},
65
+ author={Xing, Jiazheng and Du, Fei and Yuan, Hangjie and Liu, Pengwei and Xu, Hongbin and Ci, Hai and Niu, Ruigang and Chen, Weihua and Wang, Fan and Liu, Yong},
66
+ booktitle={The Fourteenth International Conference on Learning Representations}
67
+ }
68
+ ```
69
+
70
+ ## ๐Ÿ“ฃ Disclaimer
71
+
72
+ This is the official release channel for LumosX weights.
73
+