Spaces:
Runtime error
Runtime error
Commit
·
dc72d06
1
Parent(s):
3747436
Add torch to requirements
Browse files- DIFFUSERS_COMPATIBILITY.md +65 -0
- Dockerfile +27 -0
- app.py +8 -1
- attention_custom.py +129 -4
- compatibility_patches.py +56 -0
- patch_diffusers.sh +13 -0
- pipeline_stable_diffusion_custom.py +96 -14
- requirements.txt +3 -3
- test_imports.py +29 -0
- test_pipeline.py +45 -0
- transformer_2d_custom.py +48 -2
- unet2d_custom.py +78 -16
- unet_2d_blocks_custom.py +127 -15
DIFFUSERS_COMPATIBILITY.md
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Diffusers Compatibility Issues
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
This document outlines compatibility issues with the SonicDiffusion project and the diffusers library version 0.21.4.
|
| 5 |
+
|
| 6 |
+
## Identified Issues
|
| 7 |
+
|
| 8 |
+
The project requires components from newer versions of diffusers that are not available in 0.21.4, including:
|
| 9 |
+
|
| 10 |
+
1. `IPAdapterMixin` in `diffusers.loaders`
|
| 11 |
+
2. `FromSingleFileMixin` in `diffusers.loaders`
|
| 12 |
+
3. `PeftAdapterMixin` in `diffusers.loaders`
|
| 13 |
+
4. `USE_PEFT_BACKEND` in `diffusers.utils`
|
| 14 |
+
5. `apply_freeu` in `diffusers.utils.torch_utils`
|
| 15 |
+
6. `AdaGroupNorm` in `diffusers.models.normalization`
|
| 16 |
+
7. `ResnetBlockCondNorm2D` in `diffusers.models.resnet`
|
| 17 |
+
8. `DualTransformer2DModel` in `diffusers.models.transformers.dual_transformer_2d`
|
| 18 |
+
9. `GEGLU`, `GELU`, `ApproximateGELU` in `diffusers.models.activations`
|
| 19 |
+
10. `ImagePositionalEmbeddings`, `PatchEmbed`, `PixArtAlphaTextProjection` in `diffusers.models.embeddings`
|
| 20 |
+
11. `AdaLayerNormSingle` in `diffusers.models.normalization`
|
| 21 |
+
12. `StableDiffusionMixin` in `diffusers.pipelines.pipeline_utils`
|
| 22 |
+
|
| 23 |
+
## Solutions
|
| 24 |
+
|
| 25 |
+
We've implemented several fixes for compatibility:
|
| 26 |
+
|
| 27 |
+
1. Added dummy implementations for missing classes
|
| 28 |
+
2. Added fallback imports with try/except blocks
|
| 29 |
+
3. Simplified implementations of complex components
|
| 30 |
+
4. Worked around limitations in the older diffusers API
|
| 31 |
+
|
| 32 |
+
## Recommended Approach
|
| 33 |
+
|
| 34 |
+
For a more reliable fix, you should:
|
| 35 |
+
|
| 36 |
+
1. **Update diffusers**: Upgrade to a newer version (we recommend at least 0.25.0)
|
| 37 |
+
```bash
|
| 38 |
+
pip install 'diffusers>=0.25.0'
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
2. **Update related packages**: Ensure complementary packages are also updated
|
| 42 |
+
```bash
|
| 43 |
+
pip install 'transformers>=4.36.0' 'accelerate>=0.25.0'
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
3. **Alternative approach**: If you cannot update diffusers, try using a standalone version without integration with HuggingFace:
|
| 47 |
+
|
| 48 |
+
- Modify the controller.py to use explicit PyTorch components without requiring diffusers for direct audio-to-image conversion
|
| 49 |
+
- Use a pre-trained model with your own implementation of the pipeline
|
| 50 |
+
|
| 51 |
+
## Error Handling for Gradio
|
| 52 |
+
|
| 53 |
+
There are also issues with Gradio compatibility. The simplest solution is:
|
| 54 |
+
|
| 55 |
+
```bash
|
| 56 |
+
pip install 'gradio>=4.19.0,<4.27.0'
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
When running the app, use:
|
| 60 |
+
|
| 61 |
+
```python
|
| 62 |
+
demo.launch(server_name="0.0.0.0", share=True)
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
This helps prevent the localhost access error and creates a shareable link.
|
Dockerfile
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.10-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
# Install system dependencies
|
| 6 |
+
RUN apt-get update && apt-get install -y \
|
| 7 |
+
build-essential \
|
| 8 |
+
git \
|
| 9 |
+
ffmpeg \
|
| 10 |
+
libsndfile1 \
|
| 11 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 12 |
+
|
| 13 |
+
# Install Python dependencies with pinned versions
|
| 14 |
+
COPY requirements.txt .
|
| 15 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 16 |
+
|
| 17 |
+
# Copy application code
|
| 18 |
+
COPY . .
|
| 19 |
+
|
| 20 |
+
# Create necessary directories
|
| 21 |
+
RUN mkdir -p assets ckpts outputs
|
| 22 |
+
|
| 23 |
+
# Expose port for Gradio
|
| 24 |
+
EXPOSE 7860
|
| 25 |
+
|
| 26 |
+
# Command to run the application
|
| 27 |
+
CMD ["python", "app.py"]
|
app.py
CHANGED
|
@@ -1,6 +1,12 @@
|
|
| 1 |
import os
|
| 2 |
import sys
|
| 3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
# Print environment information
|
| 5 |
print("==== Environment Information ====")
|
| 6 |
print(f"Python version: {sys.version}")
|
|
@@ -181,4 +187,5 @@ with gr.Blocks(title="SonicDiffusion") as demo:
|
|
| 181 |
)
|
| 182 |
|
| 183 |
if __name__ == "__main__":
|
| 184 |
-
|
|
|
|
|
|
| 1 |
import os
|
| 2 |
import sys
|
| 3 |
|
| 4 |
+
# Apply compatibility patches first
|
| 5 |
+
try:
|
| 6 |
+
import compatibility_patches
|
| 7 |
+
except ImportError:
|
| 8 |
+
print("Warning: compatibility_patches not found")
|
| 9 |
+
|
| 10 |
# Print environment information
|
| 11 |
print("==== Environment Information ====")
|
| 12 |
print(f"Python version: {sys.version}")
|
|
|
|
| 187 |
)
|
| 188 |
|
| 189 |
if __name__ == "__main__":
|
| 190 |
+
# Change the server parameters
|
| 191 |
+
demo.launch(server_name="0.0.0.0", share=True)
|
attention_custom.py
CHANGED
|
@@ -1,17 +1,142 @@
|
|
| 1 |
# Adapted from https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py
|
| 2 |
|
| 3 |
from typing import Any, Dict, Optional
|
|
|
|
| 4 |
|
| 5 |
import torch
|
| 6 |
import torch.nn.functional as F
|
| 7 |
from torch import nn
|
| 8 |
|
| 9 |
from diffusers.utils import deprecate, logging
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
from diffusers.models.attention_processor import Attention
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
|
| 17 |
logger = logging.get_logger(__name__)
|
|
|
|
| 1 |
# Adapted from https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py
|
| 2 |
|
| 3 |
from typing import Any, Dict, Optional
|
| 4 |
+
import math
|
| 5 |
|
| 6 |
import torch
|
| 7 |
import torch.nn.functional as F
|
| 8 |
from torch import nn
|
| 9 |
|
| 10 |
from diffusers.utils import deprecate, logging
|
| 11 |
+
|
| 12 |
+
# Import maybe_allow_in_graph or define if not available
|
| 13 |
+
try:
|
| 14 |
+
from diffusers.utils.torch_utils import maybe_allow_in_graph
|
| 15 |
+
except ImportError:
|
| 16 |
+
def maybe_allow_in_graph(fn):
|
| 17 |
+
"""Dummy decorator for compatibility with older diffusers versions"""
|
| 18 |
+
return fn
|
| 19 |
+
|
| 20 |
+
# Define activation functions since they're not available in this version of diffusers
|
| 21 |
+
# GELU activation
|
| 22 |
+
class GELU(nn.Module):
|
| 23 |
+
"""
|
| 24 |
+
Custom implementation of GELU activation for compatibility with older diffusers versions.
|
| 25 |
+
See https://arxiv.org/abs/1606.08415 for details.
|
| 26 |
+
"""
|
| 27 |
+
def forward(self, input):
|
| 28 |
+
return F.gelu(input)
|
| 29 |
+
|
| 30 |
+
# Approximate GELU
|
| 31 |
+
class ApproximateGELU(nn.Module):
|
| 32 |
+
"""
|
| 33 |
+
Custom implementation of Approximate GELU activation for compatibility with older diffusers versions.
|
| 34 |
+
"""
|
| 35 |
+
def forward(self, input):
|
| 36 |
+
return 0.5 * input * (1 + torch.tanh(math.sqrt(2 / math.pi) * (input + 0.044715 * torch.pow(input, 3))))
|
| 37 |
+
|
| 38 |
+
# GEGLU activation
|
| 39 |
+
class GEGLU(nn.Module):
|
| 40 |
+
"""
|
| 41 |
+
Custom implementation of GEGLU activation for compatibility with older diffusers versions.
|
| 42 |
+
See https://arxiv.org/abs/2002.05202 for more details.
|
| 43 |
+
"""
|
| 44 |
+
def __init__(self, dim_in, dim_out):
|
| 45 |
+
super().__init__()
|
| 46 |
+
self.proj = nn.Linear(dim_in, dim_out * 2)
|
| 47 |
+
self.dim_out = dim_out
|
| 48 |
+
|
| 49 |
+
def forward(self, hidden_states):
|
| 50 |
+
hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1)
|
| 51 |
+
return hidden_states * F.gelu(gate)
|
| 52 |
from diffusers.models.attention_processor import Attention
|
| 53 |
+
|
| 54 |
+
# Import embeddings with fallbacks
|
| 55 |
+
try:
|
| 56 |
+
from diffusers.models.embeddings import SinusoidalPositionalEmbedding
|
| 57 |
+
except ImportError:
|
| 58 |
+
# Define a simple SinusoidalPositionalEmbedding
|
| 59 |
+
class SinusoidalPositionalEmbedding(nn.Module):
|
| 60 |
+
"""
|
| 61 |
+
Custom implementation of SinusoidalPositionalEmbedding for compatibility with older diffusers versions.
|
| 62 |
+
"""
|
| 63 |
+
def __init__(self, dim, max_seq_length=5000):
|
| 64 |
+
super().__init__()
|
| 65 |
+
self.dim = dim
|
| 66 |
+
self.max_seq_length = max_seq_length
|
| 67 |
+
|
| 68 |
+
def forward(self, seq_length):
|
| 69 |
+
position = torch.arange(seq_length, device=seq_length.device)
|
| 70 |
+
dim_t = torch.arange(self.dim // 2, device=seq_length.device)
|
| 71 |
+
dim_t = 10000 ** (2 * (dim_t) / self.dim)
|
| 72 |
+
|
| 73 |
+
x = position[:, None] / dim_t[None, :]
|
| 74 |
+
embeddings = torch.cat((torch.sin(x), torch.cos(x)), dim=1)
|
| 75 |
+
|
| 76 |
+
if self.dim % 2 == 1: # if odd, add zero padding
|
| 77 |
+
embeddings = torch.cat((embeddings, torch.zeros_like(embeddings[:, :1])), dim=1)
|
| 78 |
+
|
| 79 |
+
return embeddings.to(seq_length.device)
|
| 80 |
+
|
| 81 |
+
# Import normalization layers with fallbacks
|
| 82 |
+
try:
|
| 83 |
+
from diffusers.models.normalization import AdaLayerNorm, AdaLayerNormContinuous, AdaLayerNormZero, RMSNorm
|
| 84 |
+
except ImportError:
|
| 85 |
+
# Define simple versions for compatibility
|
| 86 |
+
class AdaLayerNorm(nn.Module):
|
| 87 |
+
"""
|
| 88 |
+
Custom implementation of AdaLayerNorm for compatibility with older diffusers versions.
|
| 89 |
+
"""
|
| 90 |
+
def __init__(self, embedding_dim, num_embeddings=None):
|
| 91 |
+
super().__init__()
|
| 92 |
+
self.emb = nn.Linear(embedding_dim, embedding_dim * 2)
|
| 93 |
+
self.norm = nn.LayerNorm(embedding_dim, elementwise_affine=False)
|
| 94 |
+
|
| 95 |
+
def forward(self, x, emb):
|
| 96 |
+
shift, scale = self.emb(emb).chunk(2, dim=1)
|
| 97 |
+
x = self.norm(x)
|
| 98 |
+
return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
|
| 99 |
+
|
| 100 |
+
class AdaLayerNormContinuous(nn.Module):
|
| 101 |
+
"""
|
| 102 |
+
Custom implementation of AdaLayerNormContinuous for compatibility with older diffusers versions.
|
| 103 |
+
"""
|
| 104 |
+
def __init__(self, embedding_dim):
|
| 105 |
+
super().__init__()
|
| 106 |
+
self.emb = nn.Linear(embedding_dim, embedding_dim * 2)
|
| 107 |
+
self.norm = nn.LayerNorm(embedding_dim, elementwise_affine=False)
|
| 108 |
+
|
| 109 |
+
def forward(self, x, emb):
|
| 110 |
+
shift, scale = self.emb(emb).chunk(2, dim=1)
|
| 111 |
+
x = self.norm(x)
|
| 112 |
+
return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
|
| 113 |
+
|
| 114 |
+
class AdaLayerNormZero(nn.Module):
|
| 115 |
+
"""
|
| 116 |
+
Custom implementation of AdaLayerNormZero for compatibility with older diffusers versions.
|
| 117 |
+
"""
|
| 118 |
+
def __init__(self, embedding_dim):
|
| 119 |
+
super().__init__()
|
| 120 |
+
self.emb = nn.Linear(embedding_dim, embedding_dim * 2)
|
| 121 |
+
self.norm = nn.LayerNorm(embedding_dim, elementwise_affine=False)
|
| 122 |
+
|
| 123 |
+
def forward(self, x, emb):
|
| 124 |
+
shift, scale = self.emb(emb).chunk(2, dim=1)
|
| 125 |
+
x = self.norm(x)
|
| 126 |
+
return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
|
| 127 |
+
|
| 128 |
+
class RMSNorm(nn.Module):
|
| 129 |
+
"""
|
| 130 |
+
Custom implementation of RMSNorm for compatibility with older diffusers versions.
|
| 131 |
+
"""
|
| 132 |
+
def __init__(self, dim, eps=1e-6):
|
| 133 |
+
super().__init__()
|
| 134 |
+
self.scale = dim ** 0.5
|
| 135 |
+
self.eps = eps
|
| 136 |
+
self.g = nn.Parameter(torch.ones(dim))
|
| 137 |
+
|
| 138 |
+
def forward(self, x):
|
| 139 |
+
return x * self.g / torch.norm(x, dim=-1, keepdim=True).clamp(min=self.eps) * self.scale
|
| 140 |
|
| 141 |
|
| 142 |
logger = logging.get_logger(__name__)
|
compatibility_patches.py
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Compatibility patches for huggingface_hub and diffusers
|
| 3 |
+
"""
|
| 4 |
+
import sys
|
| 5 |
+
import importlib
|
| 6 |
+
from functools import wraps
|
| 7 |
+
|
| 8 |
+
# Check if huggingface_hub is installed
|
| 9 |
+
try:
|
| 10 |
+
import huggingface_hub
|
| 11 |
+
|
| 12 |
+
# Add the cached_download function if it doesn't exist
|
| 13 |
+
if not hasattr(huggingface_hub, 'cached_download'):
|
| 14 |
+
def cached_download(*args, **kwargs):
|
| 15 |
+
"""Compatibility function to replace cached_download"""
|
| 16 |
+
# Use the newer hf_hub_download function
|
| 17 |
+
return huggingface_hub.hf_hub_download(*args, **kwargs)
|
| 18 |
+
|
| 19 |
+
# Add the missing function to the module
|
| 20 |
+
huggingface_hub.cached_download = cached_download
|
| 21 |
+
|
| 22 |
+
except ImportError:
|
| 23 |
+
print("huggingface_hub not found, skipping patch")
|
| 24 |
+
|
| 25 |
+
# Patch for diffusers dynamic_modules_utils.py
|
| 26 |
+
try:
|
| 27 |
+
import diffusers.utils.dynamic_modules_utils as dmu
|
| 28 |
+
|
| 29 |
+
# Store the original import statement
|
| 30 |
+
original_import = dmu.__import__
|
| 31 |
+
|
| 32 |
+
# Define a wrapper for __import__
|
| 33 |
+
@wraps(original_import)
|
| 34 |
+
def patched_import(name, *args, **kwargs):
|
| 35 |
+
try:
|
| 36 |
+
return original_import(name, *args, **kwargs)
|
| 37 |
+
except ImportError as e:
|
| 38 |
+
if 'cached_download' in str(e) and name == 'huggingface_hub':
|
| 39 |
+
# Import the module without the missing function
|
| 40 |
+
mod = importlib.import_module(name)
|
| 41 |
+
|
| 42 |
+
# Add the missing function
|
| 43 |
+
if not hasattr(mod, 'cached_download'):
|
| 44 |
+
def cached_download(*args, **kwargs):
|
| 45 |
+
return mod.hf_hub_download(*args, **kwargs)
|
| 46 |
+
|
| 47 |
+
mod.cached_download = cached_download
|
| 48 |
+
|
| 49 |
+
return mod
|
| 50 |
+
raise
|
| 51 |
+
|
| 52 |
+
# Apply the patch
|
| 53 |
+
dmu.__import__ = patched_import
|
| 54 |
+
|
| 55 |
+
except ImportError:
|
| 56 |
+
print("diffusers.utils.dynamic_modules_utils not found, skipping patch")
|
patch_diffusers.sh
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Run this script to patch the dynamic_modules_utils.py file
|
| 3 |
+
|
| 4 |
+
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
|
| 5 |
+
DMU_FILE="$SITE_PACKAGES/diffusers/utils/dynamic_modules_utils.py"
|
| 6 |
+
|
| 7 |
+
# Create a backup
|
| 8 |
+
cp "$DMU_FILE" "${DMU_FILE}.bak"
|
| 9 |
+
|
| 10 |
+
# Replace the import statement
|
| 11 |
+
sed -i 's/from huggingface_hub import cached_download, hf_hub_download, model_info/from huggingface_hub import hf_hub_download, model_info\n\ndef cached_download(*args, **kwargs):\n """Compatibility wrapper for hf_hub_download"""\n return hf_hub_download(*args, **kwargs)/g' "$DMU_FILE"
|
| 12 |
+
|
| 13 |
+
echo "Patched $DMU_FILE"
|
pipeline_stable_diffusion_custom.py
CHANGED
|
@@ -4,29 +4,110 @@ import inspect
|
|
| 4 |
from typing import Any, Callable, Dict, List, Optional, Union
|
| 5 |
|
| 6 |
import torch
|
|
|
|
| 7 |
from packaging import version
|
| 8 |
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
from diffusers.configuration_utils import FrozenDict
|
| 11 |
from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
from diffusers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
from diffusers.models.lora import adjust_lora_scale_text_encoder
|
| 17 |
from diffusers.schedulers import KarrasDiffusionSchedulers
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
from diffusers.utils.torch_utils import randn_tensor
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
|
| 32 |
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
|
|
@@ -104,6 +185,7 @@ def retrieve_timesteps(
|
|
| 104 |
return timesteps, num_inference_steps
|
| 105 |
|
| 106 |
|
|
|
|
| 107 |
class StableDiffusionPipeline(
|
| 108 |
DiffusionPipeline,
|
| 109 |
StableDiffusionMixin,
|
|
|
|
| 4 |
from typing import Any, Callable, Dict, List, Optional, Union
|
| 5 |
|
| 6 |
import torch
|
| 7 |
+
import torch.nn as nn
|
| 8 |
from packaging import version
|
| 9 |
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection
|
| 10 |
|
| 11 |
+
# Import ModelMixin and ConfigMixin for our custom classes
|
| 12 |
+
from diffusers.configuration_utils import ConfigMixin
|
| 13 |
+
from diffusers.models.modeling_utils import ModelMixin
|
| 14 |
+
from diffusers.utils import BaseOutput
|
| 15 |
+
|
| 16 |
from diffusers.configuration_utils import FrozenDict
|
| 17 |
from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
|
| 18 |
|
| 19 |
+
# Modified to handle older diffusers versions (0.21.4)
|
| 20 |
+
try:
|
| 21 |
+
from diffusers.loaders import IPAdapterMixin, LoraLoaderMixin, TextualInversionLoaderMixin, FromSingleFileMixin
|
| 22 |
+
except ImportError:
|
| 23 |
+
# Create dummy classes for missing imports
|
| 24 |
+
from diffusers.loaders import LoraLoaderMixin, TextualInversionLoaderMixin
|
| 25 |
+
|
| 26 |
+
# Define dummy mixins for backward compatibility
|
| 27 |
+
class IPAdapterMixin:
|
| 28 |
+
"""Dummy IPAdapterMixin for compatibility with older diffusers versions."""
|
| 29 |
+
pass
|
| 30 |
+
|
| 31 |
+
class FromSingleFileMixin:
|
| 32 |
+
"""Dummy FromSingleFileMixin for compatibility with older diffusers versions."""
|
| 33 |
+
pass
|
| 34 |
+
|
| 35 |
+
# Import models with fallback for older diffusers versions
|
| 36 |
+
try:
|
| 37 |
+
from diffusers.models import AutoencoderKL, ImageProjection, UNet2DConditionModel
|
| 38 |
+
except ImportError:
|
| 39 |
+
from diffusers.models import AutoencoderKL, UNet2DConditionModel
|
| 40 |
+
|
| 41 |
+
# Define dummy class for compatibility
|
| 42 |
+
class ImageProjection(nn.Module):
|
| 43 |
+
"""Dummy ImageProjection for compatibility with older diffusers versions."""
|
| 44 |
+
def __init__(self, image_embed_dim=None, cross_attention_dim=None):
|
| 45 |
+
super().__init__()
|
| 46 |
+
self.image_embed_dim = image_embed_dim
|
| 47 |
+
self.cross_attention_dim = cross_attention_dim
|
| 48 |
from diffusers.models.lora import adjust_lora_scale_text_encoder
|
| 49 |
from diffusers.schedulers import KarrasDiffusionSchedulers
|
| 50 |
+
# Check if USE_PEFT_BACKEND is available in diffusers
|
| 51 |
+
try:
|
| 52 |
+
from diffusers.utils import (
|
| 53 |
+
USE_PEFT_BACKEND,
|
| 54 |
+
deprecate,
|
| 55 |
+
logging,
|
| 56 |
+
replace_example_docstring,
|
| 57 |
+
scale_lora_layers,
|
| 58 |
+
unscale_lora_layers,
|
| 59 |
+
)
|
| 60 |
+
except ImportError:
|
| 61 |
+
from diffusers.utils import deprecate, logging
|
| 62 |
+
|
| 63 |
+
# Define placeholders for missing utilities
|
| 64 |
+
USE_PEFT_BACKEND = False
|
| 65 |
+
|
| 66 |
+
def replace_example_docstring(example_docstring):
|
| 67 |
+
"""Dummy function for compatibility with older diffusers versions."""
|
| 68 |
+
def decorator(fn):
|
| 69 |
+
return fn
|
| 70 |
+
return decorator
|
| 71 |
+
|
| 72 |
+
def scale_lora_layers(model, weight):
|
| 73 |
+
"""Dummy function for compatibility with older diffusers versions."""
|
| 74 |
+
pass
|
| 75 |
+
|
| 76 |
+
def unscale_lora_layers(model, weight):
|
| 77 |
+
"""Dummy function for compatibility with older diffusers versions."""
|
| 78 |
+
pass
|
| 79 |
from diffusers.utils.torch_utils import randn_tensor
|
| 80 |
+
|
| 81 |
+
# Import pipeline utils with fallbacks
|
| 82 |
+
try:
|
| 83 |
+
from diffusers.pipelines.pipeline_utils import DiffusionPipeline, StableDiffusionMixin
|
| 84 |
+
except ImportError:
|
| 85 |
+
from diffusers.pipelines.pipeline_utils import DiffusionPipeline
|
| 86 |
+
|
| 87 |
+
# Create a minimal StableDiffusionMixin for compatibility
|
| 88 |
+
class StableDiffusionMixin:
|
| 89 |
+
"""Custom implementation of StableDiffusionMixin for compatibility with older diffusers versions."""
|
| 90 |
+
pass
|
| 91 |
+
|
| 92 |
+
# Import pipeline output and safety checker
|
| 93 |
+
try:
|
| 94 |
+
from diffusers.pipelines.stable_diffusion.pipeline_output import StableDiffusionPipelineOutput
|
| 95 |
+
from diffusers.pipelines.stable_diffusion.safety_checker import StableDiffusionSafetyChecker
|
| 96 |
+
except ImportError:
|
| 97 |
+
# Define custom StableDiffusionPipelineOutput for compatibility
|
| 98 |
+
class StableDiffusionPipelineOutput(BaseOutput):
|
| 99 |
+
"""Custom implementation for compatibility with older diffusers versions."""
|
| 100 |
+
images: torch.FloatTensor
|
| 101 |
+
nsfw_content_detected: Optional[List[bool]]
|
| 102 |
+
|
| 103 |
+
# Define custom StableDiffusionSafetyChecker for compatibility
|
| 104 |
+
class StableDiffusionSafetyChecker(ModelMixin, ConfigMixin):
|
| 105 |
+
"""Custom implementation for compatibility with older diffusers versions."""
|
| 106 |
+
def __init__(self, *args, **kwargs):
|
| 107 |
+
super().__init__()
|
| 108 |
+
|
| 109 |
+
def forward(self, images, clip_input):
|
| 110 |
+
return images, [False] * len(images)
|
| 111 |
|
| 112 |
|
| 113 |
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
|
|
|
|
| 185 |
return timesteps, num_inference_steps
|
| 186 |
|
| 187 |
|
| 188 |
+
# Try to determine what mixins are available in the installed diffusers version
|
| 189 |
class StableDiffusionPipeline(
|
| 190 |
DiffusionPipeline,
|
| 191 |
StableDiffusionMixin,
|
requirements.txt
CHANGED
|
@@ -1,12 +1,12 @@
|
|
| 1 |
-
gradio>=4.0.0
|
| 2 |
requests>=2.30.0
|
| 3 |
tqdm>=4.66.0
|
| 4 |
torch==2.0.1
|
| 5 |
transformers>=4.30.0,<4.36.0
|
| 6 |
diffusers==0.21.4
|
| 7 |
-
huggingface_hub==0.
|
| 8 |
accelerate>=0.24.0
|
| 9 |
einops>=0.7.0
|
| 10 |
omegaconf>=2.0.0
|
| 11 |
librosa>=0.9.0
|
| 12 |
-
soundfile>=0.12.0
|
|
|
|
| 1 |
+
gradio>=4.0.0,<5.0.0
|
| 2 |
requests>=2.30.0
|
| 3 |
tqdm>=4.66.0
|
| 4 |
torch==2.0.1
|
| 5 |
transformers>=4.30.0,<4.36.0
|
| 6 |
diffusers==0.21.4
|
| 7 |
+
huggingface_hub==0.16.4
|
| 8 |
accelerate>=0.24.0
|
| 9 |
einops>=0.7.0
|
| 10 |
omegaconf>=2.0.0
|
| 11 |
librosa>=0.9.0
|
| 12 |
+
soundfile>=0.12.0
|
test_imports.py
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import sys
|
| 2 |
+
print("Python version:", sys.version)
|
| 3 |
+
print("Python path:", sys.path)
|
| 4 |
+
|
| 5 |
+
try:
|
| 6 |
+
import diffusers
|
| 7 |
+
print("Diffusers version:", diffusers.__version__)
|
| 8 |
+
|
| 9 |
+
# Try importing specific classes from diffusers
|
| 10 |
+
from diffusers.configuration_utils import FrozenDict
|
| 11 |
+
print("Successfully imported FrozenDict")
|
| 12 |
+
|
| 13 |
+
from diffusers.loaders import IPAdapterMixin, LoraLoaderMixin, TextualInversionLoaderMixin
|
| 14 |
+
print("Successfully imported mixins")
|
| 15 |
+
|
| 16 |
+
from diffusers.models import AutoencoderKL, UNet2DConditionModel
|
| 17 |
+
print("Successfully imported models")
|
| 18 |
+
|
| 19 |
+
# Try pipeline-specific imports
|
| 20 |
+
from diffusers.pipelines.pipeline_utils import DiffusionPipeline, StableDiffusionMixin
|
| 21 |
+
print("Successfully imported pipeline utils")
|
| 22 |
+
|
| 23 |
+
from diffusers.pipelines.stable_diffusion.pipeline_output import StableDiffusionPipelineOutput
|
| 24 |
+
print("Successfully imported pipeline output")
|
| 25 |
+
|
| 26 |
+
except ImportError as e:
|
| 27 |
+
print("Import error:", e)
|
| 28 |
+
import traceback
|
| 29 |
+
traceback.print_exc()
|
test_pipeline.py
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Simple script to test if our fixes for diffusers compatibility are working.
|
| 3 |
+
This script doesn't use Gradio or the full web interface.
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import torch
|
| 8 |
+
import numpy as np
|
| 9 |
+
from PIL import Image
|
| 10 |
+
|
| 11 |
+
# Import our custom components
|
| 12 |
+
from unet2d_custom import UNet2DConditionModel
|
| 13 |
+
from pipeline_stable_diffusion_custom import StableDiffusionPipeline
|
| 14 |
+
|
| 15 |
+
def main():
|
| 16 |
+
print("Testing SonicDiffusion pipeline components...")
|
| 17 |
+
|
| 18 |
+
# Check imports
|
| 19 |
+
print("Imports successful!")
|
| 20 |
+
|
| 21 |
+
# Check if CUDA is available
|
| 22 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 23 |
+
print(f"Using device: {device}")
|
| 24 |
+
|
| 25 |
+
# Try to initialize a pipeline (without loading weights, just to test the class structure)
|
| 26 |
+
try:
|
| 27 |
+
# This will just test if the pipeline can be initialized, not if it works correctly
|
| 28 |
+
print("Testing pipeline initialization...")
|
| 29 |
+
pipeline = StableDiffusionPipeline(
|
| 30 |
+
vae=None,
|
| 31 |
+
text_encoder=None,
|
| 32 |
+
tokenizer=None,
|
| 33 |
+
unet=None,
|
| 34 |
+
scheduler=None,
|
| 35 |
+
safety_checker=None,
|
| 36 |
+
feature_extractor=None,
|
| 37 |
+
)
|
| 38 |
+
print("Pipeline initialization successful!")
|
| 39 |
+
except Exception as e:
|
| 40 |
+
print(f"Error initializing pipeline: {e}")
|
| 41 |
+
|
| 42 |
+
print("Tests completed.")
|
| 43 |
+
|
| 44 |
+
if __name__ == "__main__":
|
| 45 |
+
main()
|
transformer_2d_custom.py
CHANGED
|
@@ -11,9 +11,55 @@ from diffusers.configuration_utils import ConfigMixin, register_to_config
|
|
| 11 |
from diffusers.utils import BaseOutput, deprecate, is_torch_version, logging
|
| 12 |
from attention_custom import BasicTransformerBlock
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
from diffusers.models.modeling_utils import ModelMixin
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
|
| 19 |
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
|
|
|
|
| 11 |
from diffusers.utils import BaseOutput, deprecate, is_torch_version, logging
|
| 12 |
from attention_custom import BasicTransformerBlock
|
| 13 |
|
| 14 |
+
# Import embeddings with fallbacks
|
| 15 |
+
try:
|
| 16 |
+
from diffusers.models.embeddings import ImagePositionalEmbeddings, PatchEmbed, PixArtAlphaTextProjection
|
| 17 |
+
except ImportError:
|
| 18 |
+
# Define custom classes for compatibility
|
| 19 |
+
class ImagePositionalEmbeddings(nn.Module):
|
| 20 |
+
"""Custom implementation for compatibility with older diffusers versions."""
|
| 21 |
+
def __init__(self, *args, **kwargs):
|
| 22 |
+
super().__init__()
|
| 23 |
+
self.position_embeddings = nn.Parameter(torch.zeros(1, 1, 1, 1))
|
| 24 |
+
|
| 25 |
+
def forward(self, x):
|
| 26 |
+
return x + self.position_embeddings
|
| 27 |
+
|
| 28 |
+
class PatchEmbed(nn.Module):
|
| 29 |
+
"""Custom implementation for compatibility with older diffusers versions."""
|
| 30 |
+
def __init__(self, *args, **kwargs):
|
| 31 |
+
super().__init__()
|
| 32 |
+
self.proj = nn.Conv2d(3, 1024, kernel_size=1)
|
| 33 |
+
|
| 34 |
+
def forward(self, x):
|
| 35 |
+
return self.proj(x).flatten(2).transpose(1, 2)
|
| 36 |
+
|
| 37 |
+
class PixArtAlphaTextProjection(nn.Module):
|
| 38 |
+
"""Custom implementation for compatibility with older diffusers versions."""
|
| 39 |
+
def __init__(self, *args, **kwargs):
|
| 40 |
+
super().__init__()
|
| 41 |
+
|
| 42 |
+
def forward(self, x):
|
| 43 |
+
return x
|
| 44 |
+
|
| 45 |
from diffusers.models.modeling_utils import ModelMixin
|
| 46 |
+
|
| 47 |
+
# Import normalization with fallbacks
|
| 48 |
+
try:
|
| 49 |
+
from diffusers.models.normalization import AdaLayerNormSingle
|
| 50 |
+
except ImportError:
|
| 51 |
+
# Define a custom AdaLayerNormSingle
|
| 52 |
+
class AdaLayerNormSingle(nn.Module):
|
| 53 |
+
"""Custom implementation for compatibility with older diffusers versions."""
|
| 54 |
+
def __init__(self, embedding_dim, emb_dim=None):
|
| 55 |
+
super().__init__()
|
| 56 |
+
self.emb_layer = nn.Linear(emb_dim or embedding_dim, embedding_dim)
|
| 57 |
+
self.norm = nn.LayerNorm(embedding_dim, elementwise_affine=False)
|
| 58 |
+
|
| 59 |
+
def forward(self, x, emb):
|
| 60 |
+
shift = self.emb_layer(emb).unsqueeze(1)
|
| 61 |
+
x = self.norm(x)
|
| 62 |
+
return x + shift
|
| 63 |
|
| 64 |
|
| 65 |
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
|
unet2d_custom.py
CHANGED
|
@@ -8,10 +8,32 @@ import torch.nn as nn
|
|
| 8 |
import torch.utils.checkpoint
|
| 9 |
|
| 10 |
from diffusers.configuration_utils import ConfigMixin, register_to_config
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
from diffusers.models.activations import get_activation
|
| 16 |
|
| 17 |
from diffusers.models.attention_processor import (
|
|
@@ -22,18 +44,57 @@ from diffusers.models.attention_processor import (
|
|
| 22 |
AttnAddedKVProcessor,
|
| 23 |
AttnProcessor,
|
| 24 |
)
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
from diffusers.models.modeling_utils import ModelMixin
|
| 38 |
|
| 39 |
from unet_2d_blocks_custom import (
|
|
@@ -60,6 +121,7 @@ class UNet2DConditionOutput(BaseOutput):
|
|
| 60 |
sample: torch.FloatTensor = None
|
| 61 |
|
| 62 |
|
|
|
|
| 63 |
class UNet2DConditionModel(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin, PeftAdapterMixin):
|
| 64 |
r"""
|
| 65 |
A conditional 2D UNet model that takes a noisy sample, conditional state, and a timestep and returns a sample
|
|
|
|
| 8 |
import torch.utils.checkpoint
|
| 9 |
|
| 10 |
from diffusers.configuration_utils import ConfigMixin, register_to_config
|
| 11 |
+
# Modified to handle older diffusers versions (0.21.4)
|
| 12 |
+
try:
|
| 13 |
+
from diffusers.loaders import PeftAdapterMixin, UNet2DConditionLoadersMixin
|
| 14 |
+
except ImportError:
|
| 15 |
+
from diffusers.loaders import UNet2DConditionLoadersMixin
|
| 16 |
+
|
| 17 |
+
# Define dummy mixin for backward compatibility
|
| 18 |
+
class PeftAdapterMixin:
|
| 19 |
+
"""Dummy PeftAdapterMixin for compatibility with older diffusers versions."""
|
| 20 |
+
pass
|
| 21 |
+
|
| 22 |
+
# Check if USE_PEFT_BACKEND is available in diffusers
|
| 23 |
+
try:
|
| 24 |
+
from diffusers.utils import USE_PEFT_BACKEND, BaseOutput, deprecate, logging, scale_lora_layers, unscale_lora_layers
|
| 25 |
+
except ImportError:
|
| 26 |
+
from diffusers.utils import BaseOutput, deprecate, logging
|
| 27 |
+
# Define placeholders for missing utilities
|
| 28 |
+
USE_PEFT_BACKEND = False
|
| 29 |
+
|
| 30 |
+
def scale_lora_layers(model, weight):
|
| 31 |
+
"""Dummy function for compatibility with older diffusers versions."""
|
| 32 |
+
pass
|
| 33 |
+
|
| 34 |
+
def unscale_lora_layers(model, weight):
|
| 35 |
+
"""Dummy function for compatibility with older diffusers versions."""
|
| 36 |
+
pass
|
| 37 |
from diffusers.models.activations import get_activation
|
| 38 |
|
| 39 |
from diffusers.models.attention_processor import (
|
|
|
|
| 44 |
AttnAddedKVProcessor,
|
| 45 |
AttnProcessor,
|
| 46 |
)
|
| 47 |
+
try:
|
| 48 |
+
from diffusers.models.embeddings import (
|
| 49 |
+
GaussianFourierProjection,
|
| 50 |
+
GLIGENTextBoundingboxProjection,
|
| 51 |
+
ImageHintTimeEmbedding,
|
| 52 |
+
ImageProjection,
|
| 53 |
+
ImageTimeEmbedding,
|
| 54 |
+
TextImageProjection,
|
| 55 |
+
TextImageTimeEmbedding,
|
| 56 |
+
TextTimeEmbedding,
|
| 57 |
+
TimestepEmbedding,
|
| 58 |
+
Timesteps,
|
| 59 |
+
)
|
| 60 |
+
except ImportError:
|
| 61 |
+
# For older diffusers versions
|
| 62 |
+
from diffusers.models.embeddings import (
|
| 63 |
+
GaussianFourierProjection,
|
| 64 |
+
ImageProjection,
|
| 65 |
+
TextTimeEmbedding,
|
| 66 |
+
TimestepEmbedding,
|
| 67 |
+
Timesteps,
|
| 68 |
+
)
|
| 69 |
+
|
| 70 |
+
# Define missing classes for compatibility
|
| 71 |
+
class GLIGENTextBoundingboxProjection(nn.Module):
|
| 72 |
+
"""Dummy class for compatibility with older diffusers versions."""
|
| 73 |
+
def __init__(self, positive_len=None, out_dim=None, feature_type=None):
|
| 74 |
+
super().__init__()
|
| 75 |
+
self.positive_len = positive_len
|
| 76 |
+
self.out_dim = out_dim
|
| 77 |
+
self.feature_type = feature_type
|
| 78 |
+
|
| 79 |
+
class ImageHintTimeEmbedding(nn.Module):
|
| 80 |
+
"""Dummy class for compatibility with older diffusers versions."""
|
| 81 |
+
def __init__(self, image_embed_dim=None, time_embed_dim=None):
|
| 82 |
+
super().__init__()
|
| 83 |
+
|
| 84 |
+
class ImageTimeEmbedding(nn.Module):
|
| 85 |
+
"""Dummy class for compatibility with older diffusers versions."""
|
| 86 |
+
def __init__(self, image_embed_dim=None, time_embed_dim=None):
|
| 87 |
+
super().__init__()
|
| 88 |
+
|
| 89 |
+
class TextImageProjection(nn.Module):
|
| 90 |
+
"""Dummy class for compatibility with older diffusers versions."""
|
| 91 |
+
def __init__(self, text_embed_dim=None, image_embed_dim=None, cross_attention_dim=None):
|
| 92 |
+
super().__init__()
|
| 93 |
+
|
| 94 |
+
class TextImageTimeEmbedding(nn.Module):
|
| 95 |
+
"""Dummy class for compatibility with older diffusers versions."""
|
| 96 |
+
def __init__(self, text_embed_dim=None, image_embed_dim=None, time_embed_dim=None):
|
| 97 |
+
super().__init__()
|
| 98 |
from diffusers.models.modeling_utils import ModelMixin
|
| 99 |
|
| 100 |
from unet_2d_blocks_custom import (
|
|
|
|
| 121 |
sample: torch.FloatTensor = None
|
| 122 |
|
| 123 |
|
| 124 |
+
# Modified for compatibility with older diffusers
|
| 125 |
class UNet2DConditionModel(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin, PeftAdapterMixin):
|
| 126 |
r"""
|
| 127 |
A conditional 2D UNet model that takes a noisy sample, conditional state, and a timestep and returns a sample
|
unet_2d_blocks_custom.py
CHANGED
|
@@ -8,24 +8,136 @@ import torch.nn.functional as F
|
|
| 8 |
from torch import nn
|
| 9 |
|
| 10 |
from diffusers.utils import deprecate, is_torch_version, logging
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
from diffusers.models.activations import get_activation
|
| 14 |
from diffusers.models.attention_processor import Attention, AttnAddedKVProcessor, AttnAddedKVProcessor2_0
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
from transformer_2d_custom import Transformer2DModel
|
| 30 |
|
| 31 |
#from diffusers.models.transformers.transformer_2d import Transformer2DModel
|
|
|
|
| 8 |
from torch import nn
|
| 9 |
|
| 10 |
from diffusers.utils import deprecate, is_torch_version, logging
|
| 11 |
+
|
| 12 |
+
# Import apply_freeu or define it if not available
|
| 13 |
+
try:
|
| 14 |
+
from diffusers.utils.torch_utils import apply_freeu
|
| 15 |
+
except ImportError:
|
| 16 |
+
# Define a custom apply_freeu function for compatibility
|
| 17 |
+
def apply_freeu(
|
| 18 |
+
feats: torch.Tensor,
|
| 19 |
+
hidden_states: torch.Tensor,
|
| 20 |
+
res_hidden_states: torch.Tensor,
|
| 21 |
+
s1: float,
|
| 22 |
+
s2: float,
|
| 23 |
+
b1: float,
|
| 24 |
+
b2: float,
|
| 25 |
+
) -> torch.Tensor:
|
| 26 |
+
"""
|
| 27 |
+
Custom implementation of FreeU for older diffusers versions.
|
| 28 |
+
See https://github.com/ChenyangSi/FreeU for more details.
|
| 29 |
+
|
| 30 |
+
Args:
|
| 31 |
+
feats: Features at the current layer
|
| 32 |
+
hidden_states: Hidden states from the previous layer
|
| 33 |
+
res_hidden_states: Residual hidden states from the previous layer
|
| 34 |
+
s1: Scaling factor for frequency components
|
| 35 |
+
s2: Scaling factor for frequency components
|
| 36 |
+
b1: Scaling factor for original hidden states
|
| 37 |
+
b2: Scaling factor for original hidden states
|
| 38 |
+
|
| 39 |
+
Returns:
|
| 40 |
+
The processed feature map
|
| 41 |
+
"""
|
| 42 |
+
if all(param is None for param in [s1, s2, b1, b2]):
|
| 43 |
+
return hidden_states
|
| 44 |
+
|
| 45 |
+
# Simple implementation that just passes through the hidden states unchanged
|
| 46 |
+
# This maintains compatibility without the actual FreeU feature
|
| 47 |
+
return hidden_states
|
| 48 |
|
| 49 |
from diffusers.models.activations import get_activation
|
| 50 |
from diffusers.models.attention_processor import Attention, AttnAddedKVProcessor, AttnAddedKVProcessor2_0
|
| 51 |
+
|
| 52 |
+
# Handle missing AdaGroupNorm
|
| 53 |
+
try:
|
| 54 |
+
from diffusers.models.normalization import AdaGroupNorm
|
| 55 |
+
except ImportError:
|
| 56 |
+
# Define a custom AdaGroupNorm class if it's not available
|
| 57 |
+
class AdaGroupNorm(nn.Module):
|
| 58 |
+
"""Custom implementation of AdaGroupNorm for compatibility with older diffusers versions."""
|
| 59 |
+
|
| 60 |
+
def __init__(self, embedding_dim, num_groups=32, eps=1e-5):
|
| 61 |
+
super().__init__()
|
| 62 |
+
self.num_groups = num_groups
|
| 63 |
+
self.eps = eps
|
| 64 |
+
self.embedding_dim = embedding_dim
|
| 65 |
+
|
| 66 |
+
self.linear = nn.Linear(embedding_dim, embedding_dim * 2)
|
| 67 |
+
|
| 68 |
+
def forward(self, x, emb):
|
| 69 |
+
# Simple implementation that falls back to GroupNorm
|
| 70 |
+
emb = self.linear(emb)
|
| 71 |
+
emb = emb[:, :, None, None]
|
| 72 |
+
scale, shift = emb.chunk(2, dim=1)
|
| 73 |
+
|
| 74 |
+
# Use standard GroupNorm
|
| 75 |
+
x = nn.functional.group_norm(x, self.num_groups, eps=self.eps)
|
| 76 |
+
# Apply scale and shift
|
| 77 |
+
return x * (1 + scale) + shift
|
| 78 |
+
|
| 79 |
+
# Import resnet components with fallbacks for older diffusers versions
|
| 80 |
+
try:
|
| 81 |
+
from diffusers.models.resnet import (
|
| 82 |
+
Downsample2D,
|
| 83 |
+
FirDownsample2D,
|
| 84 |
+
FirUpsample2D,
|
| 85 |
+
KDownsample2D,
|
| 86 |
+
KUpsample2D,
|
| 87 |
+
ResnetBlock2D,
|
| 88 |
+
ResnetBlockCondNorm2D,
|
| 89 |
+
Upsample2D,
|
| 90 |
+
)
|
| 91 |
+
except ImportError:
|
| 92 |
+
# Import what's available
|
| 93 |
+
from diffusers.models.resnet import (
|
| 94 |
+
Downsample2D,
|
| 95 |
+
FirDownsample2D,
|
| 96 |
+
FirUpsample2D,
|
| 97 |
+
KDownsample2D,
|
| 98 |
+
KUpsample2D,
|
| 99 |
+
ResnetBlock2D,
|
| 100 |
+
Upsample2D,
|
| 101 |
+
)
|
| 102 |
+
|
| 103 |
+
# Define a custom ResnetBlockCondNorm2D class
|
| 104 |
+
class ResnetBlockCondNorm2D(nn.Module):
|
| 105 |
+
"""
|
| 106 |
+
Resnet block with conditional normalization for compatibility with older diffusers versions.
|
| 107 |
+
|
| 108 |
+
Args:
|
| 109 |
+
in_channels (int): Number of input channels.
|
| 110 |
+
out_channels (int): Number of output channels.
|
| 111 |
+
temb_channels (int): Number of timestep embedding channels.
|
| 112 |
+
groups (int, optional): Number of groups for GroupNorm. Defaults to 32.
|
| 113 |
+
eps (float, optional): Epsilon for GroupNorm. Defaults to 1e-5.
|
| 114 |
+
"""
|
| 115 |
+
def __init__(
|
| 116 |
+
self,
|
| 117 |
+
*args,
|
| 118 |
+
**kwargs
|
| 119 |
+
):
|
| 120 |
+
super().__init__()
|
| 121 |
+
# Use ResnetBlock2D as fallback
|
| 122 |
+
self.block = ResnetBlock2D(*args, **kwargs)
|
| 123 |
+
|
| 124 |
+
def forward(self, hidden_states, temb=None, scale=None):
|
| 125 |
+
return self.block(hidden_states, temb)
|
| 126 |
+
|
| 127 |
+
# Import transformer models
|
| 128 |
+
try:
|
| 129 |
+
from diffusers.models.transformers.dual_transformer_2d import DualTransformer2DModel
|
| 130 |
+
except ImportError:
|
| 131 |
+
# Define a custom DualTransformer2DModel for older diffusers versions
|
| 132 |
+
class DualTransformer2DModel(nn.Module):
|
| 133 |
+
"""Dummy implementation for older diffusers versions"""
|
| 134 |
+
def __init__(self, *args, **kwargs):
|
| 135 |
+
super().__init__()
|
| 136 |
+
|
| 137 |
+
def forward(self, *args, **kwargs):
|
| 138 |
+
raise NotImplementedError("DualTransformer2DModel is not available in this version of diffusers")
|
| 139 |
+
|
| 140 |
+
# Use our custom Transformer2DModel
|
| 141 |
from transformer_2d_custom import Transformer2DModel
|
| 142 |
|
| 143 |
#from diffusers.models.transformers.transformer_2d import Transformer2DModel
|