File size: 6,098 Bytes
253526b 07b8e53 253526b 07b8e53 253526b 6665c4d 253526b 6665c4d 253526b 07b8e53 253526b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
base_model:
- stable-diffusion-v1-5/stable-diffusion-v1-5
license: apache-2.0
pipeline_tag: text-to-image
library_name: diffusers
---
<meta name="google-site-verification" content="-XQC-POJtlDPD3i2KSOxbFkSBde_Uq9obAIh_4mxTkM" />
<div align="center">
<h2>DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability</h2>
<h3>[ICCV 2025]</h3>
[Xirui Hu](https://openreview.net/profile?id=~Xirui_Hu1),
[Jiahao Wang](https://openreview.net/profile?id=~Jiahao_Wang14),
[Hao Chen](https://openreview.net/profile?id=~Hao_chen100),
[Weizhan Zhang](https://openreview.net/profile?id=~Weizhan_Zhang1),
[Benqi Wang](https://openreview.net/profile?id=~Benqi_Wang2),
[Yikun Li](https://openreview.net/profile?id=~Yikun_Li1),
[Haishun Nan](https://openreview.net/profile?id=~Haishun_Nan1),
[](https://arxiv.org/abs/2503.06505)
[](https://github.com/ByteCat-bot/DynamicID)
</div>
---
This is the official implementation of DynamicID, a framework that generates visually harmonious image featuring **multiple individuals**. Each person in the image can be specified through user-provided reference images, and most notably, our method enables **independent control of each individual's facial expression** via text prompts. Hope you have fun with this demo!
---
## π Abstract
Recent advancements in text-to-image generation have spurred interest in personalized human image generation. Although existing methods achieve high-fidelity identity preservation, they often struggle with **limited multi-ID usability** and **inadequate facial editability**.
We present DynamicID, a tuning-free framework that inherently facilitates both single-ID and multi-ID personalized generation with high fidelity and flexible facial editability. Our key innovations include:
- Semantic-Activated Attention (SAA), which employs query-level activation gating to minimize disruption to the original model when injecting ID features and achieve multi-ID personalization without requiring multi-ID samples during training.
- Identity-Motion Reconfigurator (IMR), which applies feature-space manipulation to effectively disentangle and reconfigure facial motion and identity features, supporting flexible facial editing.
- A task-decoupled training paradigm that reduces data dependency
- A curated VariFace-10k facial dataset, comprising 10k unique individuals, each represented by 35 distinct facial images.
Experimental results demonstrate that DynamicID outperforms state-of-the-art methods in identity fidelity, facial editability, and multi-ID personalization capability.
## π‘ Method
<div align="center">
<img src="assets/pipeline.jpg", width="1000">
</div>
The proposed framework is architected around two core components: SAA and IMR. (a) In the anchoring stage, we jointly optimize the SAA and a face encoder to establish robust single-ID and multi-ID personalized generation capabilities. (b) Subsequently in the reconfiguration stage, we freeze these optimized components and leverage them to train the IMR for flexible and fine-grained facial editing.
## π Checkpoint
1. Download the pretrained Stable Diffusion v1.5 checkpoint from [Stable Diffusion v1.5 on Hugging Face](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5).
2. Download our SAA-related and IMR-related checkpoints from [DynamicID Checkpoints on Hugging Face](https://huggingface.co/meteorite2023/DynamicID).
## β‘ Sample Usage (Diffusers)
The official inference code is available in the [GitHub repository](https://github.com/ByteCat-bot/DynamicID), which provides detailed instructions for running the model. A typical usage with the `diffusers` library would involve loading the base Stable Diffusion pipeline and then integrating the DynamicID specific weights.
```python
import torch
from diffusers import StableDiffusionPipeline
# Load the base Stable Diffusion pipeline
# Ensure you have downloaded the base model locally or from Hugging Face Hub
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
).to("cuda")
# Load DynamicID specific weights (e.g., LoRAs or custom UNet modifications)
# The precise method for loading these weights will be detailed in the official repository.
# For conceptual understanding, it might involve:
# pipeline.load_lora_weights("path/to/DynamicID/weights")
# Or integrating custom UNet/attention layers as per the DynamicID implementation.
# Refer to the official GitHub repository for the exact loading and inference pipeline.
# You would then pass your text prompt and identity reference images to the pipeline.
# Example (conceptual):
# prompt = "a photo of [person1] with a big smile and [person2] looking thoughtful"
# generated_image = pipeline(
# prompt=prompt,
# identity_references=[id_image_1, id_image_2], # Placeholder for identity images
# # Add other parameters as specified in the DynamicID code
# ).images[0]
```
## π Gallery
<div align="center">
<img src="assets/teaser.jpg", width="900">
<br><br><br>
<img src="assets/single.jpg", width="900">
<br><br><br>
<img src="assets/multi.jpg", width="900">
</div>
## π ToDo List
- [x] Release technical report
- [x] Release **training and inference code**
- [x] Release **Dynamic-sd** (based on *stable diffusion v1.5*)
- [ ] Release **Dynamic-flux** (based on *Flux-dev*)
- [ ] Release a Hugging Face Demo Space
## π Citation
If you are inspired by our work, please cite our paper.
```bibtex
@inproceedings{dynamicid,
title={DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability},
author={Xirui Hu,
Jiahao Wang,
Hao Chen,
Weizhan Zhang,
Benqi Wang,
Yikun Li,
Haishun Nan
},
booktitle={International Conference on Computer Vision},
year={2025}
}
``` |