xilanhua12138 commited on
Commit
18c2c92
Β·
verified Β·
1 Parent(s): b9e607d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -5,4 +5,117 @@ language:
5
  base_model:
6
  - stabilityai/stable-diffusion-xl-base-1.0
7
  library_name: diffusers
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  base_model:
6
  - stabilityai/stable-diffusion-xl-base-1.0
7
  library_name: diffusers
8
+ ---
9
+
10
+ <div align="center">
11
+ <h1>RaCig: A RAG-based Character-Consistent Story Image Generation Model</h1>
12
+
13
+ <a href='https://huggingface.co/ZuluVision/RaCig'><img src='https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-blue'></a>
14
+ <a href='https://pan.baidu.com/s/1Vt2meAg5DkjUXktY_H6eNg?pwd=ympj'><img src='https://img.shields.io/badge/Baidu_Netdisk-Dataset-green?logo=baidu'></a>
15
+ </div>
16
+
17
+ ### 1. Multi-charater image generation with rich motion
18
+ <div align="center">
19
+ <img src="assets/teaser.png" alt="Teaser Image" width="700"/>
20
+ </div>
21
+
22
+ ### 2. Model structure preview
23
+ <div align="center">
24
+ <img src="assets/model_structure.png" alt="Model Structure" width="700"/>
25
+ </div>
26
+
27
+
28
+ ## πŸ“– Overview
29
+
30
+ RaCig is designed to generate images based on textual prompts and reference images for characters (referred to as "Characters"). It leverages several models and techniques, including:
31
+
32
+ * Text-to-image retrieval (using CLIP)
33
+ * IP-Adapter for incorporating reference image features (face and body/clothes)
34
+ * ControlNet for pose/skeleton guidance
35
+ * Action Direction DINO for action direction recognition
36
+ * A pipeline (`RaCigPipeline`) to orchestrate the generation process.
37
+
38
+ The pipeline can handle multiple characters ("Characters") in a single scene, defined by their names, gender, and reference images (face and clothes).
39
+
40
+ ## πŸ“¦ Installation
41
+
42
+ 1. **Clone the repository:**
43
+ ```bash
44
+ git clone https://github.com/ZulutionAI/RaCig
45
+ cd RaCig
46
+ ```
47
+
48
+ 2. **Install dependencies:**
49
+ ```bash
50
+ pip install -r requirements.txt
51
+ ```
52
+
53
+ 3. **Download necessary models and retrieval datasets:**
54
+
55
+ Models: https://huggingface.co/ZuluVision/RaCig
56
+
57
+ Put the models under checkpoint as follow:
58
+
59
+ ```
60
+ ./models/
61
+ β”œβ”€β”€ action_direction_dino/
62
+ β”‚ └── checkpoint_best_regular.pth
63
+ β”œβ”€β”€ controlnet/
64
+ β”‚ └── model.safetensors
65
+ β”œβ”€β”€ image_encoder/
66
+ β”‚ β”œβ”€β”€ config.json
67
+ β”‚ β”œβ”€β”€ model.safetensors
68
+ β”‚ └── pytorch_model.bin
69
+ β”œβ”€β”€ ipa_weights/
70
+ β”‚ β”œβ”€β”€ ip-adapter-plus-face_sdxl_vit-h.bin
71
+ β”‚ └── ip-adapter-plus_sdxl_vit-h.bin
72
+ └── sdxl/
73
+ └── dreamshaper.safetensors
74
+ ```
75
+
76
+ Retrieval datasets: https://pan.baidu.com/s/1Vt2meAg5DkjUXktY_H6eNg?pwd=ympj
77
+
78
+ ```
79
+ ./data
80
+ β”œβ”€β”€ MSDBv2_v7
81
+ β”œβ”€β”€ Reelshot_retrieval
82
+ └── retrieve_info
83
+ ```
84
+ ## πŸ’» Usage
85
+ ### Inference
86
+ 1. **Run Inference:**
87
+ ```python
88
+ python inference.py
89
+ ```
90
+ 2. Generated images, retrieved images, and skeleton visualizations will be saved in the `output/` directory by default.
91
+ Β·
92
+ ### Gradio
93
+
94
+ ```python
95
+ python run_gradio.py
96
+ ```
97
+
98
+
99
+ For more detailed instruction, see [Gradio Interface Instructions (EN)](docs/gradio_instruction_en.md) or [Gradio Interface Instructions (δΈ­ζ–‡)](docs/gradio_instruction_cn.md)
100
+
101
+
102
+ ## πŸ› οΈ Training
103
+
104
+ 1. We only train the controlnet, to make it recognize the feature map better. (The fused feature map after injecting IP information is quite hard for controlnet to constrain the pose, so we slightly finetune the controlnet)
105
+
106
+ 2. We use the retrieval dataset to finetune it. The dataset structure is organized as above.
107
+
108
+ ```bash
109
+ bash train.sh
110
+ ```
111
+
112
+ ## 🀝 Contributing
113
+
114
+
115
+
116
+ ## ❀️ Acknowledgements
117
+
118
+ This project is based on the work of the following open-source projects and contributors:
119
+
120
+ * [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter) - Image Prompt Adapter developed by Tencent AI Lab
121
+ * [xiaohu2015](https://github.com/xiaohu2015)