How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("Bl4ckSpaces/FFD-XL-2.0", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

πŸ”₯ FFD-XL 2.0 (v3.1) - The Awakened Architecture (Restored Edition)

By Bl4ckspaces

Showcase Image 1

Welcome to FastFourierDiffusion-XL (FFD-XL) 2.0 v3.1. This is the Restored Editionβ€”a paradigm shift in mathematical model merging. Moving beyond simple weight averaging, FFD-XL utilizes advanced frequency manipulation and matrix decomposition to engineer an absolute powerhouse for 2.5D, anime, and cinematic generation.

If you are tired of models with "foggy" lighting, broken extremity anatomy, or color bleeding, the Awakened Architecture is engineered to be your final stop.

πŸ“ The Foundation: FFT & RSVD

This model is built upon a surgical, multi-domain merging pipeline. Instead of blindly mixing models, we dissect them using two core mathematical principles:

  • FFT (Fast Fourier Transform): We map the model's weights into the frequency domain. This allows us to separate Low Frequencies (macro-lighting, global contrast, and cinematic color grading) from High Frequencies (micro-details, skin textures, and sharp edges). The result? We inject stunning 2.5D textures without destroying the underlying lighting, achieving a pristine, "zero-fog" contrast.
  • RSVD (Randomized Singular Value Decomposition): We use RSVD to extract the absolute core "skeleton" of a model's understanding. By isolating the most significant singular vectors of anatomical models (like Hasshaku), we surgically graft perfect pose structures and extremity accuracy (hands/eyes) into the base model without carrying over unwanted stylistic artifacts.

Showcase Image 2

🧬 Why FFD-XL 2.0 v3.1 is Superior

  • Fourier-Space Sinkhorn Optimal Transport (FS-OT): Perfect cinematic color grading locked in with Phase Blending to prevent visual ghosting and color bleeding.
  • Multi-Band Spectral RSVD & EAR: By dynamically allocating bone structure, we enforce 100% Empirical Asymmetric Routing (EAR) dominance on extreme appendages. This guarantees uncompromising anatomical accuracy for fingers and eyes.
  • Exact TIES Restricted HF: 2.5D skin textures and micro-details are injected purely into the High-Frequency domain using an Exact Quantile Noise Filter, keeping the base generation incredibly clean.
  • Pre-Flight Shield: Mathematically shielded against FP16 NaN blowouts, ensuring stable generation across all UIs.

🎯 Recommended Settings

To unleash the true potential and the "wild" micro-details of this model, strictly follow these parameters:

  • Sampler (Crucial): DPM++ 2M Karras (Highly recommended for maximum sharpness and intricate textures). Euler a is acceptable for a softer, classic look, but DPM++ 2M Karras is where the architecture truly breathes.
  • Steps: 25 - 35
  • CFG Scale: 5.0 - 7.0
  • Resolution & Scaling (Near-2K Support): * Base Native: 832x1216, 1024x1024, or 1216x832.
    • High-Res Multipliers: This model robustly supports direct high-resolution generation up to near-2K scale. You can confidently scale your initial dimensions by 1.2x, 1.3x, 1.4x, or up to 1.5x without the latent space breaking apartβ€”as long as the final resolution values are multiples of 64 (e.g., 1216x1792, 1472x1472, or 1536x1536).
  • Clip Skip: 2

πŸ’¬ Prompting Guide (Danbooru Syntax)

The architecture features a highly enriched, democratic vocabulary balanced between ViT-L & ViT-bigG. It responds exceptionally well to structured Danbooru tags.

Positive Prompt Structure:

(masterpiece, best quality, ultra-detailed:1.2), 1girl, solo, glowing eyes, detailed cinematic lighting, looking at viewer, [subject description], [clothing], [background], depth of field, 8k resolution

Negative Prompt:

(worst quality, low quality, normal quality:1.4), deformed, bad anatomy, bad hands, missing fingers, blurry, ugly, text, watermark, fog

Showcase Image 3

πŸ“œ License & Credits

  • Creator: Bl4ckspaces
  • License: OpenRAIL
  • Acknowledgments: Massive thanks to the creators of Wai Base, Janku, Hasshaku, Perfect Illustrious, and NTR MIX for providing the foundational latents that made this surgical merge possible.

πŸš€ How to Use

1. ComfyUI (Recommended)

  1. Download the FFD_XL_v3_1_2_0_RESTORED_fp16.safetensors file.
  2. Place it in your ComfyUI models/checkpoints folder.
  3. Use a standard SDXL workflow. Load the model using the Load Checkpoint node.
  4. (Optional but recommended) No need for an external VAE; the model natively includes the SDXL FP16 Fix VAE.

2. Diffusers (Python)

You can easily load this model using the diffusers library.

from diffusers import StableDiffusionXLPipeline
import torch

repo_id = "Bl4ckSpaces/FFD-XL-2.0"
model_name = "FFD_XL_v3_1_2_0_RESTORED_fp16.safetensors"

pipe = StableDiffusionXLPipeline.from_single_file(
    f"[https://huggingface.co/](https://huggingface.co/){repo_id}/blob/main/{model_name}",
    torch_dtype=torch.float16,
    use_safetensors=True
)
pipe.to("cuda")

prompt = "(masterpiece, best quality:1.2), 1girl, solo, intricate details, cinematic lighting, cyberpunk city background"
negative_prompt = "(worst quality, low quality:1.4), bad anatomy, watermark"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=6.0,
).images[0]

image.save("ffd_xl_masterpiece.png")
Downloads last month
133
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support