|
|
--- |
|
|
base_model: |
|
|
- stable-diffusion-v1-5/stable-diffusion-v1-5 |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-to-image |
|
|
library_name: diffusers |
|
|
--- |
|
|
|
|
|
<meta name="google-site-verification" content="-XQC-POJtlDPD3i2KSOxbFkSBde_Uq9obAIh_4mxTkM" /> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<h2>DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability</h2> |
|
|
<h3>[ICCV 2025]</h3> |
|
|
|
|
|
[Xirui Hu](https://openreview.net/profile?id=~Xirui_Hu1), |
|
|
[Jiahao Wang](https://openreview.net/profile?id=~Jiahao_Wang14), |
|
|
[Hao Chen](https://openreview.net/profile?id=~Hao_chen100), |
|
|
[Weizhan Zhang](https://openreview.net/profile?id=~Weizhan_Zhang1), |
|
|
[Benqi Wang](https://openreview.net/profile?id=~Benqi_Wang2), |
|
|
[Yikun Li](https://openreview.net/profile?id=~Yikun_Li1), |
|
|
[Haishun Nan](https://openreview.net/profile?id=~Haishun_Nan1), |
|
|
|
|
|
[](https://arxiv.org/abs/2503.06505) |
|
|
[](https://github.com/ByteCat-bot/DynamicID) |
|
|
</div> |
|
|
|
|
|
--- |
|
|
This is the official implementation of DynamicID, a framework that generates visually harmonious image featuring **multiple individuals**. Each person in the image can be specified through user-provided reference images, and most notably, our method enables **independent control of each individual's facial expression** via text prompts. Hope you have fun with this demo! |
|
|
|
|
|
--- |
|
|
|
|
|
## π Abstract |
|
|
|
|
|
Recent advancements in text-to-image generation have spurred interest in personalized human image generation. Although existing methods achieve high-fidelity identity preservation, they often struggle with **limited multi-ID usability** and **inadequate facial editability**. |
|
|
|
|
|
We present DynamicID, a tuning-free framework that inherently facilitates both single-ID and multi-ID personalized generation with high fidelity and flexible facial editability. Our key innovations include: |
|
|
|
|
|
- Semantic-Activated Attention (SAA), which employs query-level activation gating to minimize disruption to the original model when injecting ID features and achieve multi-ID personalization without requiring multi-ID samples during training. |
|
|
|
|
|
- Identity-Motion Reconfigurator (IMR), which applies feature-space manipulation to effectively disentangle and reconfigure facial motion and identity features, supporting flexible facial editing. |
|
|
|
|
|
- A task-decoupled training paradigm that reduces data dependency |
|
|
|
|
|
- A curated VariFace-10k facial dataset, comprising 10k unique individuals, each represented by 35 distinct facial images. |
|
|
|
|
|
Experimental results demonstrate that DynamicID outperforms state-of-the-art methods in identity fidelity, facial editability, and multi-ID personalization capability. |
|
|
|
|
|
## π‘ Method |
|
|
|
|
|
<div align="center"> |
|
|
<img src="assets/pipeline.jpg", width="1000"> |
|
|
</div> |
|
|
|
|
|
The proposed framework is architected around two core components: SAA and IMR. (a) In the anchoring stage, we jointly optimize the SAA and a face encoder to establish robust single-ID and multi-ID personalized generation capabilities. (b) Subsequently in the reconfiguration stage, we freeze these optimized components and leverage them to train the IMR for flexible and fine-grained facial editing. |
|
|
|
|
|
## π Checkpoint |
|
|
|
|
|
1. Download the pretrained Stable Diffusion v1.5 checkpoint from [Stable Diffusion v1.5 on Hugging Face](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5). |
|
|
|
|
|
2. Download our SAA-related and IMR-related checkpoints from [DynamicID Checkpoints on Hugging Face](https://huggingface.co/meteorite2023/DynamicID). |
|
|
|
|
|
## β‘ Sample Usage (Diffusers) |
|
|
|
|
|
The official inference code is available in the [GitHub repository](https://github.com/ByteCat-bot/DynamicID), which provides detailed instructions for running the model. A typical usage with the `diffusers` library would involve loading the base Stable Diffusion pipeline and then integrating the DynamicID specific weights. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import StableDiffusionPipeline |
|
|
|
|
|
# Load the base Stable Diffusion pipeline |
|
|
# Ensure you have downloaded the base model locally or from Hugging Face Hub |
|
|
pipeline = StableDiffusionPipeline.from_pretrained( |
|
|
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16 |
|
|
).to("cuda") |
|
|
|
|
|
# Load DynamicID specific weights (e.g., LoRAs or custom UNet modifications) |
|
|
# The precise method for loading these weights will be detailed in the official repository. |
|
|
# For conceptual understanding, it might involve: |
|
|
# pipeline.load_lora_weights("path/to/DynamicID/weights") |
|
|
# Or integrating custom UNet/attention layers as per the DynamicID implementation. |
|
|
|
|
|
# Refer to the official GitHub repository for the exact loading and inference pipeline. |
|
|
# You would then pass your text prompt and identity reference images to the pipeline. |
|
|
# Example (conceptual): |
|
|
# prompt = "a photo of [person1] with a big smile and [person2] looking thoughtful" |
|
|
# generated_image = pipeline( |
|
|
# prompt=prompt, |
|
|
# identity_references=[id_image_1, id_image_2], # Placeholder for identity images |
|
|
# # Add other parameters as specified in the DynamicID code |
|
|
# ).images[0] |
|
|
``` |
|
|
|
|
|
## π Gallery |
|
|
|
|
|
<div align="center"> |
|
|
<img src="assets/teaser.jpg", width="900"> |
|
|
<br><br><br> |
|
|
<img src="assets/single.jpg", width="900"> |
|
|
<br><br><br> |
|
|
<img src="assets/multi.jpg", width="900"> |
|
|
</div> |
|
|
|
|
|
## π ToDo List |
|
|
|
|
|
- [x] Release technical report |
|
|
- [x] Release **training and inference code** |
|
|
- [x] Release **Dynamic-sd** (based on *stable diffusion v1.5*) |
|
|
- [ ] Release **Dynamic-flux** (based on *Flux-dev*) |
|
|
- [ ] Release a Hugging Face Demo Space |
|
|
|
|
|
## π Citation |
|
|
If you are inspired by our work, please cite our paper. |
|
|
```bibtex |
|
|
@inproceedings{dynamicid, |
|
|
title={DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability}, |
|
|
author={Xirui Hu, |
|
|
Jiahao Wang, |
|
|
Hao Chen, |
|
|
Weizhan Zhang, |
|
|
Benqi Wang, |
|
|
Yikun Li, |
|
|
Haishun Nan |
|
|
}, |
|
|
booktitle={International Conference on Computer Vision}, |
|
|
year={2025} |
|
|
} |
|
|
|
|
|
``` |