PotatoBox
/

Momo-XL

Model card Files Files and versions

xet

Community

PotatoBox commited on Oct 14, 2024

Commit

bc25d31

verified ·

1 Parent(s): 80f03ba

Update README.md

Browse files

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ license: mit
   <img src="./card_images/11.png" class="wide" alt="Sample Image 11">
 </div>
-**Momo XL** is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics.
 ## Key Features:
@@ -66,3 +66,35 @@ This model may produce unexpected or unintended results. **Use with caution and
 - **Data Sources**: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.
 Thank you! 😊

   <img src="./card_images/11.png" class="wide" alt="Sample Image 11">
 </div>
+**Momo XL** is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics. (Oct 6, 2024)
 ## Key Features:
 - **Data Sources**: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.
 Thank you! 😊
+------------------------------------------------------
+## Momo XL - Training Details (Oct 15, 2024)
+### Dataset
+Momo XL was trained using a dataset of over **400,000+ images** sourced from Danbooru.
+### Base Model
+Momo XL was built on top of SDXL, incorporating knowledge from two finetuned models:
+- Formula:
+  `SDXL_base + (Animagine 3.0 base - SDXL_base) * 1.0 + (Pony V6 - SDXL_base) * 0.5`
+For more details:
+- [Animagine 3.0 base](https://huggingface.co/Linaqruf/animagine-xl-3.0)
+- [Pony V6](https://huggingface.co/LyliaEngine/Pony_Diffusion_V6_XL)
+### Training Process
+Training was conducted on **A100 80GB GPUs**, totaling over **2000+ GPU hours**. The training was divided into three stages:
+- **Finetuning - First Stage**: Trained on the entire dataset with a defined set of training configurations.
+- **Finetuning - Second Stage**: Also trained on the entire dataset with some variations in settings.
+- **Adjustment Stage**: Focused on aesthetic adjustments to improve the overall visual quality.
+The final model, **Momo XL**, was released by merging the Text Encoder from the Finetuning Second Stage with the UNet from the Adjustment Stage.
+### Hyperparameters
+| Stage                    | Epochs | UNet lr | Text Encoder lr | Batch Size | Resolution | Noise Offset | Optimizer  | LR Scheduler |
+|--------------------------|--------|---------|-----------------|------------|------------|--------------|------------|--------------|
+| **Finetuning 1st Stage**  | 10     | 2e-5    | 1e-5            | 256        | 1024²      | N/A          | AdamW8bit  | Constant     |
+| **Finetuning 2nd Stage**  | 10     | 2e-5    | 1e-5            | 256        | Max. 1280² | N/A          | AdamW      | Constant     |
+| **Adjustment Stage**      | 0.25   | 8e-5    | 4e-5            | 1024       | Max. 1280² | 0.05         | AdamW      | Constant     |