ayushtues commited on
Commit
6ee0429
·
1 Parent(s): b744de5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: diffusers
6
+ ---
7
+ # BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
8
+
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ Model card for BLIP-Diffusion, a text to image Diffusion model which enables zero-shot subject-driven generation and control-guided zero-shot generation.
13
+
14
+ The abstract from the paper is:
15
+
16
+ *Subject-driven text-to-image generation models create novel renditions of an input subject based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties preserving the subject fidelity. To overcome these limitations, we introduce BLIP-Diffusion, a new subject-driven image generation model that supports multimodal control which consumes inputs of subject images and text prompts. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation. We first pre-train the multimodal encoder following BLIP-2 to produce visual representation aligned with the text. Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions. Compared with previous methods such as DreamBooth, our model enables zero-shot subject-driven generation, and efficient fine-tuning for customized subject with up to 20x speedup. We also demonstrate that BLIP-Diffusion can be flexibly combined with existing techniques such as ControlNet and prompt-to-prompt to enable novel subject-driven generation and editing applications.*
17
+
18
+ The model is created by Dongxu Li, Junnan Li, Steven C.H. Hoi.
19
+
20
+ ### Model Sources
21
+
22
+ <!-- Provide the basic links for the model. -->
23
+
24
+ - **Original Repository:** https://github.com/salesforce/LAVIS/tree/main
25
+ - **Project Page:** https://dxli94.github.io/BLIP-Diffusion-website/
26
+
27
+ ## Uses
28
+
29
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
30
+
31
+
32
+ ## How to Get Started with the Model
33
+
34
+ Use the code below to get started with the model.
35
+
36
+ [More Information Needed]
37
+
38
+
39
+
40
+ ## Citation
41
+
42
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
43
+
44
+ **BibTeX:**
45
+
46
+ If you find this repository useful in your research, please cite:
47
+
48
+ ```
49
+ @misc{li2023blipdiffusion,
50
+ title={BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing},
51
+ author={Dongxu Li and Junnan Li and Steven C. H. Hoi},
52
+ year={2023},
53
+ eprint={2305.14720},
54
+ archivePrefix={arXiv},
55
+ primaryClass={cs.CV}
56
+ }
57
+ ```
58
+