AzhureRaven
/

rico-diffusion

@@ -12,14 +12,23 @@ inference:
     guidance_scale: 7.5
     num_inference_steps: 20
 widget:
-  - text: "red Toolbar Upper Top containing Text Left Login inside and white Input Upper Top and white Input Lower Top and red Button login Upper Middle, Android UI, Medical, white background"
 ---
 # Rico Diffusion Model Card
-This is a final project of mine where I fine-tuned a Stable Diffusion 1.5 model to create Android Mockups at 384x640 with GLIGEN (https://gligen.github.io) to control UI component positions. However, there are designs at 448x576 primarily modal dialogs.
-I used EveryDream2 (https://github.com/victorchall/EveryDream2trainer) to fine-tune the model on the Rico Dataset (http://www.interactionmining.org/rico.html) of UI Screenshots where I wrote a Python script to parse over the Semantic Annotations part of the dataset to create the captions for each screenshot as well as using the Play Store and UI Metadata to use the app categories as extra tags. I have also cropped each UI component of a given screenshot (with exceptions) and labeled them accordingly so that I can train the model on individual components first before going for the whole screenshot.
 In other words, I use a Python script run in Colab to process the Rico dataset into a new dataset containing UI Screenshots and their captions alongside individual UI components with their captions. I primarily split the individual components into two groups for the training process based on the total pixel count of 512x512 = 262,144; components smaller than the threshold are grouped into the small component group, whereas components bigger than that are in the big component group. The model is trained on those groups separately before finally training on the full UIs.
@@ -88,12 +97,13 @@ Check the "gligen_input.txt" and "gligen.png" files in the sub-folders of the "r
 The images produced in the "results" folder use these parameters in A1111 that I found to give me the best results:
 - Sampling method: DPM++ SDE
-- Sampling steps: 15
 - Width: 384
 - Height: 640
 - Batch count: 4
 - Batch size: 1
-- CFG Scale: 7
 - Seed: 555
 - Clip Skip: 2

     guidance_scale: 7.5
     num_inference_steps: 20
 widget:
+- text: >-
+    red Toolbar Upper Top containing Text Left Login inside and white Input
+    Upper Top and white Input Lower Top and red Button login Upper Middle,
+    Android UI, Medical, white background
+datasets:
+- AzhureRaven/rico-ui-component-caption
+base_model:
+- stable-diffusion-v1-5/stable-diffusion-v1-5
+tags:
+- mobile-ui
 ---
 # Rico Diffusion Model Card
+I fine-tuned a Stable Diffusion 1.5 model to create mobile UI mockups at 384x640 with GLIGEN (https://gligen.github.io) to control UI component positions. However, there are designs at 448x576 primarily modal dialogs.
+I used EveryDream2 (https://github.com/victorchall/EveryDream2trainer) to fine-tune the model on the Rico Dataset (http://www.interactionmining.org/rico.html) of UI Screenshots where I wrote a Python script to parse over the Semantic Annotations part of the dataset to create the captions for each screenshot as well as using the Play Store and UI Metadata to use the app categories as extra tags. I have also cropped each UI component of a given screenshot (with exceptions) and labeled them accordingly so that I can train the model on individual components first before going for the whole screenshot.
 In other words, I use a Python script run in Colab to process the Rico dataset into a new dataset containing UI Screenshots and their captions alongside individual UI components with their captions. I primarily split the individual components into two groups for the training process based on the total pixel count of 512x512 = 262,144; components smaller than the threshold are grouped into the small component group, whereas components bigger than that are in the big component group. The model is trained on those groups separately before finally training on the full UIs.
 The images produced in the "results" folder use these parameters in A1111 that I found to give me the best results:
 - Sampling method: DPM++ SDE
+- Schedule: Karras
+- Sampling steps: 20
 - Width: 384
 - Height: 640
 - Batch count: 4
 - Batch size: 1
+- CFG Scale: 7.5
 - Seed: 555
 - Clip Skip: 2