AiArtLab
/

sdxs

Model card Files Files and versions

recoilme commited on Jul 1, 2025

Commit

aa2eabf

·

1 Parent(s): 6dab279

readme

Files changed (1) hide show

README.md +10 -9

README.md CHANGED Viewed

@@ -7,20 +7,21 @@ pipeline_tag: text-to-image
 *XS Size, Excess Quality*
-At AiArtLab, we aim to develop a compact (1.7b) and fast (3sec/image) model that can be trained on consumer-grade graphics cards, all while operating on a limited budget.
-We have chosen the multilingual encoder Mexma-SigLIP, which supports 80 languages and processes entire sentences rather than individual tokens. Our chosen VAE architecture, AuraDiffusion, preserves details and anatomy without the blurring effects seen in other models.
-For training, we use AdamW-8bit, which allows for larger batch sizes and accelerates training on cost-effective GPUs. Our model has been trained on approximately one million images with various resolutions and styles, including anime and realistic photos. We employed a variety of annotation methods, combining both manual and automated approaches.
-However, our model does have some limitations:
-- Limited concept coverage due to the small dataset size.
 - The Image2Image functionality requires further training.
 Train status, in progress: [wandb](https://wandb.ai/recoilme/unet)
 ![result](result_grid.jpg)

 *XS Size, Excess Quality*
+At AiArtLab, we strive to create a compact (1.7b) and fast (3 sec/image) model that can be trained on consumer graphics cards with a limited budget.
+- We use U-Net for its ability to efficiently handle small datasets and train quickly on GPUs with 16GB of memory.
+- We have chosen the multilingual/multimodal encoder Mexma-SigLIP, which supports 80 languages and processes sentences rather than individual tokens.
+- We use the AuraDiffusion 16ch-VAE architecture, which preserves details and anatomy without the "haze" effect.
+- For training, we have chosen AdamW-8bit, which allows for larger batch sizes and accelerates training on low-cost GPUs.
+- The model was trained on approximately 1 million images with various resolutions and styles, including anime and realistic photos.
+- Various annotation methods were used, including both manual and automated approaches.
+### Model Limitations:
+- Limited concept coverage due to the small dataset.
 - The Image2Image functionality requires further training.
 Train status, in progress: [wandb](https://wandb.ai/recoilme/unet)
 ![result](result_grid.jpg)