| | --- |
| | base_model: |
| | - Laxhar/noobai-XL-Vpred-1.0 |
| | pipeline_tag: text-to-image |
| | --- |
| | |
| | Wahtastic Merge is a high-quality Stable Diffusion XL (SDXL) model designed to generate stunning images with improved aesthetics and excellent prompt adherence. This model is built upon the robust `noobai-XL-Vpred-1.0` base and has been further refined through the strategic merging of various other models and extensive additional training. |
| |
|
| | The ultimate goal of this model is to provide an experience very similar to the already fairly competent base of NoobAI v-pred, while fixing up rough edges. |
| | Many other merges suffer from the bimodality of either having good prompt adherence (closer to base noob) or good default aesthetics (closer to illustrious). |
| |
|
| | Ideally, both can be encapsulated in a model without sacrificing too much model knowledge to acheive this. |
| |
|
| | Up to V7, the model was entirely merged. V8 and above has additional fine-tuning applied atop the model for various fixes. |
| |
|
| | # Wahtastic Roadmap |
| | - 1536x Super-resolution support |
| | - Allow for 1536x native generation (and slightly above), akin to Illustrious 2+ |
| | - Fix e6 size tag implications (hyper ≠ huge ≠ big) |
| | - In short, e6 tags have implications; `hyper_*` implies `huge_*`, and `huge_*` implies `big_*` |
| | - Because of this, the model leans to associate big with huge, and huge with hyper, causing `big_*` to cause disproportionately large body parts at times. |
| | - Natural language captioning |
| | - Yes, CLIP sucks |
| | - Using lodestones' natural-language captions, ideally some amount of natural language understanding can be brought back |
| | - This is inspired by EasyFluff XL |
| | - Superior style knowledge |
| | - ~20k e6 artists with > 500 < 20 posts |
| | - Potentially Danbooru artists too |
| |
|
| | *(Previously known as Pando Merge)* |
| |
|
| | Compute is expensive, and while plenty has been granted to me by kind acquaintances, a fair bit of money has been poured into the training process. |
| | If you like the model, or would like to help me offset the sunken cost of this, please consider donating: |
| |
|
| | **ETH Wallet Address for Donations:** `0x645BebF82373865eC520d8AC2527524BfB174FF8` |
| | If you prefer PayPal or Stripe, please contact me on Discord @velvet.toroyashi |
| |
|
| | ## How to Use |
| |
|
| | This model can be used with any standard SDXL-compatible interface or library (e.g., Diffusers, Stable Diffusion WEBUI, ComfyUI). |
| |
|
| | ### Recommended Settings |
| |
|
| | For optimal results, we recommend the following inference parameters: |
| |
|
| | * **Sampler:** `Euler` or `Euler A` |
| | * **Scheduler:** `Normal` or `Beta` |
| | * **Steps:** `16-24` |
| | * **CFG Scale:** `3-6` |
| | * **Resolution:** |
| | * For general use: `832x1200` (or similar aspect ratios with a total area around 1024x1024) |
| | * For V9.1 (if applicable): Can natively handle `1536x` resolutions. |
| |
|
| | ### Example Usage (Python with Diffusers) |
| |
|
| | ```python |
| | from diffusers import AutoPipelineForText2Image |
| | import torch |
| | |
| | pipeline = AutoPipelineForText2Image.from_pretrained( |
| | "VelvetToroyashi/WahtasticMerge", |
| | torch_dtype=torch.float16, |
| | variant="fp16", |
| | use_safetensors=True |
| | ).to("cuda") |
| | |
| | prompt = "a majestic fantasy landscape, vibrant colors, epic, detailed, masterpiece" |
| | negative_prompt = "low quality, bad anatomy, deformed, ugly, distorted" |
| | |
| | image = pipeline( |
| | prompt=prompt, |
| | negative_prompt=negative_prompt, |
| | num_inference_steps=20, |
| | guidance_scale=5, |
| | height=1200, |
| | width=832 |
| | ).images[0] |
| | |
| | image.save("wahtastic_image.png") |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | * **Base Model:** `noobai-XL-Vpred-1.0` |
| | * **Merge Strategy:** Various models were merged to combine their strengths, followed by extensive additional training. |
| | * **Training Goal:** Improve aesthetic quality, prompt adherence, and general versatility for SDXL generations. |
| | * **Model Type:** Diffusion-based text-to-image generative model. |
| |
|
| | ## License |
| |
|
| | This model is subject to the license of its base model, `noobai-XL-Vpred-1.0`, which adheres to the [Fair AI Public License 1.0 - SD](https://huggingface.co/Laxhar/noobai-XL-Vpred-1.0/blob/main/LICENSE). Please review the original license for full terms and conditions regarding usage, including commercial use and derivative works. |
| |
|
| | ## Contributions and Support |
| |
|
| | If you find Wahtastic Merge useful and would like to support its continued development and future updates, donations are greatly appreciated\! |
| |
|
| | ## Feedback and Issues |
| |
|
| | We welcome your feedback\! If you encounter any issues or have suggestions for improvement, please open a discussion on the Hugging Face repository. |
| |
|
| | ----- |