AzhureRaven commited on
Commit
e9ed169
·
verified ·
1 Parent(s): 750481d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -5
README.md CHANGED
@@ -12,14 +12,23 @@ inference:
12
  guidance_scale: 7.5
13
  num_inference_steps: 20
14
  widget:
15
- - text: "red Toolbar Upper Top containing Text Left Login inside and white Input Upper Top and white Input Lower Top and red Button login Upper Middle, Android UI, Medical, white background"
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  # Rico Diffusion Model Card
19
 
20
- This is a final project of mine where I fine-tuned a Stable Diffusion 1.5 model to create Android Mockups at 384x640 with GLIGEN (https://gligen.github.io) to control UI component positions. However, there are designs at 448x576 primarily modal dialogs.
21
 
22
- I used EveryDream2 (https://github.com/victorchall/EveryDream2trainer) to fine-tune the model on the Rico Dataset (http://www.interactionmining.org/rico.html) of UI Screenshots where I wrote a Python script to parse over the Semantic Annotations part of the dataset to create the captions for each screenshot as well as using the Play Store and UI Metadata to use the app categories as extra tags. I have also cropped each UI component of a given screenshot (with exceptions) and labeled them accordingly so that I can train the model on individual components first before going for the whole screenshot.
23
 
24
  In other words, I use a Python script run in Colab to process the Rico dataset into a new dataset containing UI Screenshots and their captions alongside individual UI components with their captions. I primarily split the individual components into two groups for the training process based on the total pixel count of 512x512 = 262,144; components smaller than the threshold are grouped into the small component group, whereas components bigger than that are in the big component group. The model is trained on those groups separately before finally training on the full UIs.
25
 
@@ -88,12 +97,13 @@ Check the "gligen_input.txt" and "gligen.png" files in the sub-folders of the "r
88
 
89
  The images produced in the "results" folder use these parameters in A1111 that I found to give me the best results:
90
  - Sampling method: DPM++ SDE
91
- - Sampling steps: 15
 
92
  - Width: 384
93
  - Height: 640
94
  - Batch count: 4
95
  - Batch size: 1
96
- - CFG Scale: 7
97
  - Seed: 555
98
  - Clip Skip: 2
99
 
 
12
  guidance_scale: 7.5
13
  num_inference_steps: 20
14
  widget:
15
+ - text: >-
16
+ red Toolbar Upper Top containing Text Left Login inside and white Input
17
+ Upper Top and white Input Lower Top and red Button login Upper Middle,
18
+ Android UI, Medical, white background
19
+ datasets:
20
+ - AzhureRaven/rico-ui-component-caption
21
+ base_model:
22
+ - stable-diffusion-v1-5/stable-diffusion-v1-5
23
+ tags:
24
+ - mobile-ui
25
  ---
26
 
27
  # Rico Diffusion Model Card
28
 
29
+ I fine-tuned a Stable Diffusion 1.5 model to create mobile UI mockups at 384x640 with GLIGEN (https://gligen.github.io) to control UI component positions. However, there are designs at 448x576 primarily modal dialogs.
30
 
31
+ I used EveryDream2 (https://github.com/victorchall/EveryDream2trainer) to fine-tune the model on the Rico Dataset (http://www.interactionmining.org/rico.html) of UI Screenshots where I wrote a Python script to parse over the Semantic Annotations part of the dataset to create the captions for each screenshot as well as using the Play Store and UI Metadata to use the app categories as extra tags. I have also cropped each UI component of a given screenshot (with exceptions) and labeled them accordingly so that I can train the model on individual components first before going for the whole screenshot.
32
 
33
  In other words, I use a Python script run in Colab to process the Rico dataset into a new dataset containing UI Screenshots and their captions alongside individual UI components with their captions. I primarily split the individual components into two groups for the training process based on the total pixel count of 512x512 = 262,144; components smaller than the threshold are grouped into the small component group, whereas components bigger than that are in the big component group. The model is trained on those groups separately before finally training on the full UIs.
34
 
 
97
 
98
  The images produced in the "results" folder use these parameters in A1111 that I found to give me the best results:
99
  - Sampling method: DPM++ SDE
100
+ - Schedule: Karras
101
+ - Sampling steps: 20
102
  - Width: 384
103
  - Height: 640
104
  - Batch count: 4
105
  - Batch size: 1
106
+ - CFG Scale: 7.5
107
  - Seed: 555
108
  - Clip Skip: 2
109