recoilme commited on
Commit
aa2eabf
·
1 Parent(s): 6dab279
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -7,20 +7,21 @@ pipeline_tag: text-to-image
7
 
8
  *XS Size, Excess Quality*
9
 
10
- At AiArtLab, we aim to develop a compact (1.7b) and fast (3sec/image) model that can be trained on consumer-grade graphics cards, all while operating on a limited budget.
11
 
 
 
 
 
 
 
12
 
13
- We have chosen the multilingual encoder Mexma-SigLIP, which supports 80 languages and processes entire sentences rather than individual tokens. Our chosen VAE architecture, AuraDiffusion, preserves details and anatomy without the blurring effects seen in other models.
14
-
15
-
16
- For training, we use AdamW-8bit, which allows for larger batch sizes and accelerates training on cost-effective GPUs. Our model has been trained on approximately one million images with various resolutions and styles, including anime and realistic photos. We employed a variety of annotation methods, combining both manual and automated approaches.
17
-
18
-
19
- However, our model does have some limitations:
20
- - Limited concept coverage due to the small dataset size.
21
  - The Image2Image functionality requires further training.
22
 
23
 
 
24
  Train status, in progress: [wandb](https://wandb.ai/recoilme/unet)
25
 
26
  ![result](result_grid.jpg)
 
7
 
8
  *XS Size, Excess Quality*
9
 
10
+ At AiArtLab, we strive to create a compact (1.7b) and fast (3 sec/image) model that can be trained on consumer graphics cards with a limited budget.
11
 
12
+ - We use U-Net for its ability to efficiently handle small datasets and train quickly on GPUs with 16GB of memory.
13
+ - We have chosen the multilingual/multimodal encoder Mexma-SigLIP, which supports 80 languages and processes sentences rather than individual tokens.
14
+ - We use the AuraDiffusion 16ch-VAE architecture, which preserves details and anatomy without the "haze" effect.
15
+ - For training, we have chosen AdamW-8bit, which allows for larger batch sizes and accelerates training on low-cost GPUs.
16
+ - The model was trained on approximately 1 million images with various resolutions and styles, including anime and realistic photos.
17
+ - Various annotation methods were used, including both manual and automated approaches.
18
 
19
+ ### Model Limitations:
20
+ - Limited concept coverage due to the small dataset.
 
 
 
 
 
 
21
  - The Image2Image functionality requires further training.
22
 
23
 
24
+
25
  Train status, in progress: [wandb](https://wandb.ai/recoilme/unet)
26
 
27
  ![result](result_grid.jpg)