Spaces:

AYYasaswini
/

StableDiffusionAssignment24

Runtime error

App Files Files Community

AYYasaswini commited on Aug 7, 2024

Commit

81778ab

verified ·

1 Parent(s): 4e58e1a

Update app.py

Browse files

Files changed (1) hide show

app.py +1 -51

app.py CHANGED Viewed

@@ -48,33 +48,12 @@ If all you want is to make a picture with some text, you could ignore this noteb
 What we want to do in this notebook is dig a little deeper into how this works, so we'll start by checking that the example code runs. Again, this is adapted from the [HF notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) and looks very similar to what you'll find if you inspect [the `__call__()` method of the stable diffusion pipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L200).
 """
-# Some settings
-prompt = ["A watercolor painting of an otter"]
-height = 512                        # default height of Stable Diffusion
-width = 512                         # default width of Stable Diffusion
-num_inference_steps = 30            # Number of denoising steps
-guidance_scale = 7.5                # Scale for classifier-free guidance
-generator = torch.manual_seed(32)   # Seed generator to create the inital latent noise
-batch_size = 1
-# Prep text
-text_input = tokenizer(prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt")
-with torch.no_grad():
-    text_embeddings = text_encoder(text_input.input_ids.to(torch_device))[0]
-max_length = text_input.input_ids.shape[-1]
-uncond_input = tokenizer(
-    [""] * batch_size, padding="max_length", max_length=max_length, return_tensors="pt"
-)
-with torch.no_grad():
-    uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0]
-text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
 # Prep Scheduler
 def set_timesteps(scheduler, num_inference_steps):
     scheduler.set_timesteps(num_inference_steps)
     scheduler.timesteps = scheduler.timesteps.to(torch.float32) # minor fix to ensure MPS compatibility, fixed in diffusers PR 3925
-set_timesteps(scheduler,num_inference_steps)
 # Prep latents
 latents = torch.randn(
@@ -87,36 +66,6 @@ latents = latents * scheduler.init_noise_sigma # Scaling (previous versions did
 # Loop
 with autocast("cuda"):  # will fallback to CPU if no CUDA; no autocast for MPS
     for i, t in tqdm(enumerate(scheduler.timesteps), total=len(scheduler.timesteps)):
-        # expand the latents if we are doing classifier-free guidance to avoid doing two forward passes.
-        latent_model_input = torch.cat([latents] * 2)
-        sigma = scheduler.sigmas[i]
-        # Scale the latents (preconditioning):
-        # latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5) # Diffusers 0.3 and below
-        latent_model_input = scheduler.scale_model_input(latent_model_input, t)
-        # predict the noise residual
-        with torch.no_grad():
-            noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
-        # perform guidance
-        noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
-        noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
-        # compute the previous noisy sample x_t -> x_t-1
-        # latents = scheduler.step(noise_pred, i, latents)["prev_sample"] # Diffusers 0.3 and below
-        latents = scheduler.step(noise_pred, t, latents).prev_sample
-# scale and decode the image latents with vae
-latents = 1 / 0.18215 * latents
-with torch.no_grad():
-    image = vae.decode(latents).sample
-# Display
-image = (image / 2 + 0.5).clamp(0, 1)
-image = image.detach().cpu().permute(0, 2, 3, 1).numpy()
-images = (image * 255).round().astype("uint8")
-pil_images = [Image.fromarray(image) for image in images]
-pil_images[0]
 """It's working, but that's quite a bit of code! Let's look at the components one by one.
@@ -187,6 +136,7 @@ We use a text encoder model to turn our text into a set of 'embeddings' which ar
 # Our text prompt
 prompt = 'A picture of a puppy'
 """We begin with tokenization:"""
 # Turn the text into a sequnce of tokens:

 What we want to do in this notebook is dig a little deeper into how this works, so we'll start by checking that the example code runs. Again, this is adapted from the [HF notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) and looks very similar to what you'll find if you inspect [the `__call__()` method of the stable diffusion pipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L200).
 """
 # Prep Scheduler
 def set_timesteps(scheduler, num_inference_steps):
     scheduler.set_timesteps(num_inference_steps)
     scheduler.timesteps = scheduler.timesteps.to(torch.float32) # minor fix to ensure MPS compatibility, fixed in diffusers PR 3925
 # Prep latents
 latents = torch.randn(
 # Loop
 with autocast("cuda"):  # will fallback to CPU if no CUDA; no autocast for MPS
     for i, t in tqdm(enumerate(scheduler.timesteps), total=len(scheduler.timesteps)):
 """It's working, but that's quite a bit of code! Let's look at the components one by one.
 # Our text prompt
 prompt = 'A picture of a puppy'
 """We begin with tokenization:"""
 # Turn the text into a sequnce of tokens: