santhoshv6's picture
Upload README.md
40bfc22 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Stable Diffusion Textual Inversion + Custom Contrast Guidance
emoji: 🎭
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: creativeml-openrail-m

Stable Diffusion Textual Inversion + Custom Contrast Guidance

This Space lets you explore 5 textual inversion styles from the sd-concepts-library and a custom contrast-based guidance variant on top of Stable Diffusion v1.5.

Models and Styles

  • Base model: runwayml/stable-diffusion-v1-5

  • Textual inversion concepts (styles):

    • <birb-style>sd-concepts-library/birb-style
    • <moebius>sd-concepts-library/moebius
    • <midjourney-style>sd-concepts-library/midjourney-style
    • <wlop-style>sd-concepts-library/wlop-style
    • <line-art>sd-concepts-library/line-art

These are loaded as learned embeddings and used directly in the text prompt.

What the app does

  1. Baseline mode

    • Runs standard Stable Diffusion v1.5 with classifier-free guidance.

    • Uses your prompt plus the selected style token, e.g.:

      "A campfire oil painting at night, <birb-style>"
      
    • No additional loss or guidance beyond the usual text conditioning.

  2. Contrast variant mode

    • Uses the same base sampling loop and seed as baseline.
    • On later diffusion steps, applies a custom contrast-like adjustment in latent space:
      • Measures variance of the predicted “clean” latents.
      • Applies a deterministic update that pushes latents towards higher variance (higher contrast).
    • This is a lightweight, creative variant of the “blue_loss” idea from the Stable Diffusion Deep Dive notebook, but it is not based on RGB channels; it operates directly on latents.

The result is a pair of images (baseline vs contrast variant) with the same prompt, style, and seed but a different “feel” due to the extra contrast guidance.

How to use

  1. Prompt

    • Type any text prompt in the Prompt box.
    • Examples:
      • A campfire oil painting at night
      • A cinematic portrait of a wizard reading
      • A futuristic cityscape at sunrise
  2. Style (concept)

    • Choose one of the 5 textual inversion styles from the dropdown.
    • Internally the style token (e.g. <birb-style>) is appended to your prompt.
  3. Seed

    • Set a seed to make runs reproducible.
    • Use the same seed in both modes if you want a direct comparison.
  4. Steps

    • Diffusion steps (higher = slower but usually better).
    • 30–40 is a good starting point.
  5. Guidance scale

    • Classifier-free guidance strength (text adherence).
    • Typical values: 7–10.
  6. Mode

    • Baseline: standard Stable Diffusion with the selected style.
    • Contrast variant: same setup, with additional latent contrast guidance.
  7. Contrast scale

    • Controls how strong the contrast adjustment is in the variant mode.
    • Start low (around 5–15). Very high values can give very noisy, abstract images.

Implementation notes

  • Everything runs through StableDiffusionPipeline from 🤗 Diffusers.
  • Textual inversion embeddings are loaded via pipe.load_textual_inversion for each concept.
  • The contrast variant reuses the same scheduler, UNet, VAE, tokenizer, and text encoder as the baseline; only the update rule for latents is modified.
  • Safety checker is disabled here for educational use; please enable it for any public-facing or production deployment.

Credits