Spaces:

AisingioroHao0
/

Artistic-Portrait-Generation

Running on Zero

App Files Files Community

aihao commited on Aug 7, 2024

Commit

9890667

1 Parent(s): 1e085c7

add init code

Browse files

Files changed (11) hide show

.gitignore +162 -0
README.assets/example.jpg +3 -0
README.assets/main.png +3 -0
README.assets/more_examples.png +3 -0
README.md +44 -1
ip_adapter_artist/__init__.py +0 -0
ip_adapter_artist/utils/__init__.py +0 -0
ip_adapter_artist/utils/csd_clip.py +145 -0
ip_adapter_artist/utils/ip_adapter.py +72 -0
ip_adapter_artist_sdxl_demo.ipynb +210 -0
setup.py +31 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,162 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
+.pdm.toml
+.pdm-python
+.pdm-build/
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/

README.assets/example.jpg ADDED Viewed

Git LFS Details

SHA256: f3528bd6d4ec4c380302c2ec6aa0f62c9e218b961b4923f837acdec9801cc162
Pointer size: 131 Bytes
Size of remote file: 124 kB

README.assets/main.png ADDED Viewed

Git LFS Details

SHA256: 67020dadba987aa7e5f4260a9a3cd7e6bcd1b0c28db1dfc2fc7bc8236d4c0c26
Pointer size: 132 Bytes
Size of remote file: 1.8 MB

README.assets/more_examples.png ADDED Viewed

Git LFS Details

SHA256: 89e8f3726d676a67cb58191d66b2be1d601f4847837e177e91353b2351a2115d
Pointer size: 132 Bytes
Size of remote file: 1.44 MB

README.md CHANGED Viewed

	@@ -1 +1,44 @@
1	- # IP-Adapter-Artist

+# IP Adapter Artist：
+<a href='https://huggingface.co/AisingioroHao0/IP-Adapter-Artist'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a><a href=''><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue'></a> [![**IP Adapter Artist Demo**](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kV7q3Gzr8GPG9cChdDQ5ncCx84TYjuu3?usp=sharing)
+![image-20240807232402569](./README.assets/main.png)
+------
+## Introduction
+IP Adapter Artist is a specialized version that uses a professional style encoder. Its goal is to achieve style control through reference images in the text-to-image diffusion model and solve the problems of instability and incomplete stylization of existing methods. This is a preprint version, and more models and training data coming soon.
+## How to use
+[![**IP Adapter Artist Demo**](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kV7q3Gzr8GPG9cChdDQ5ncCx84TYjuu3?usp=sharing)  can be used to conduct experiments directly.
+For local experiments, please refer to a [demo](https://github.com/aihao2000/IP-Adapter-Artist/blob/main/ip_adapter_artist_sdxl_demo.ipynb).
+Local experiments require a basic torch environment and dependencies:
+```
+pip install diffusers
+pip install transformers
+pip install git+https://github.com/openai/CLIP.git
+pip install git+https://github.com/aihao2000/IP-Adapter-Artist.git
+```
+## More Examples
+![image-20240808001612810](./README.assets/more_examples.png)
+## Citation
+```
+@misc{IP-Adapter-Artist,
+  author = {Hao Ai},
+  title = {IP Adapter Artist},
+  year = {2024},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/aihao2000/IP-Adapter-Artist}}
+}
+```

ip_adapter_artist/__init__.py ADDED Viewed

File without changes

ip_adapter_artist/utils/__init__.py ADDED Viewed

File without changes

ip_adapter_artist/utils/csd_clip.py ADDED Viewed

	@@ -0,0 +1,145 @@

+import torch
+import torch.nn as nn
+import clip
+import copy
+from torch.autograd import Function
+from collections import OrderedDict
+def convert_state_dict(state_dict):
+    new_state_dict = OrderedDict()
+    for k, v in state_dict.items():
+        if k.startswith("module."):
+            k = k.replace("module.", "")
+        new_state_dict[k] = v
+    return new_state_dict
+def convert_weights_float(model: nn.Module):
+    """Convert applicable model parameters to fp32"""
+    def _convert_weights_to_fp32(l):
+        if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
+            l.weight.data = l.weight.data.float()
+            if l.bias is not None:
+                l.bias.data = l.bias.data.float()
+        if isinstance(l, nn.MultiheadAttention):
+            for attr in [
+                *[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]],
+                "in_proj_bias",
+                "bias_k",
+                "bias_v",
+            ]:
+                tensor = getattr(l, attr)
+                if tensor is not None:
+                    tensor.data = tensor.data.float()
+        for name in ["text_projection", "proj"]:
+            if hasattr(l, name):
+                attr = getattr(l, name)
+                if attr is not None:
+                    attr.data = attr.data.float()
+    model.apply(_convert_weights_to_fp32)
+class ReverseLayerF(Function):
+    @staticmethod
+    def forward(ctx, x, alpha):
+        ctx.alpha = alpha
+        return x.view_as(x)
+    @staticmethod
+    def backward(ctx, grad_output):
+        output = grad_output.neg() * ctx.alpha
+        return output, None
+## taken from https://github.com/moein-shariatnia/OpenAI-CLIP/blob/master/modules.py
+class ProjectionHead(nn.Module):
+    def __init__(self, embedding_dim, projection_dim, dropout=0):
+        super().__init__()
+        self.projection = nn.Linear(embedding_dim, projection_dim)
+        self.gelu = nn.GELU()
+        self.fc = nn.Linear(projection_dim, projection_dim)
+        self.dropout = nn.Dropout(dropout)
+        self.layer_norm = nn.LayerNorm(projection_dim)
+    def forward(self, x):
+        projected = self.projection(x)
+        x = self.gelu(projected)
+        x = self.fc(x)
+        x = self.dropout(x)
+        x = x + projected
+        x = self.layer_norm(x)
+        return x
+def init_weights(m):  # TODO: do we need init for layernorm?
+    if isinstance(m, nn.Linear):
+        torch.nn.init.xavier_uniform_(m.weight)
+        if m.bias is not None:
+            nn.init.normal_(m.bias, std=1e-6)
+class CSD_CLIP(nn.Module):
+    """backbone + projection head"""
+    def __init__(self, name="vit_large", content_proj_head="default", model_path=None):
+        super(CSD_CLIP, self).__init__()
+        self.content_proj_head = content_proj_head
+        if name == "vit_large":
+            if model_path is None:
+                clipmodel, _ = clip.load("models/ViT-L-14.pt")
+            else:
+                clipmodel, _ = clip.load(model_path)
+            self.backbone = clipmodel.visual
+            self.embedding_dim = 1024
+        elif name == "vit_base":
+            if model_path is None:
+                clipmodel, _ = clip.load("ViT-B/16")
+            else:
+                clipmodel, _ = clip.load(model_path)
+            self.backbone = clipmodel.visual
+            self.embedding_dim = 768
+            self.feat_dim = 512
+        else:
+            raise Exception("This model is not implemented")
+        convert_weights_float(self.backbone)
+        self.last_layer_style = copy.deepcopy(self.backbone.proj)
+        if content_proj_head == "custom":
+            self.last_layer_content = ProjectionHead(self.embedding_dim, self.feat_dim)
+            self.last_layer_content.apply(init_weights)
+        else:
+            self.last_layer_content = copy.deepcopy(self.backbone.proj)
+        self.backbone.proj = None
+    @property
+    def dtype(self):
+        return self.backbone.conv1.weight.dtype
+    def forward(self, input_data, alpha=None):
+        feature = self.backbone(input_data)
+        if alpha is not None:
+            reverse_feature = ReverseLayerF.apply(feature, alpha)
+        else:
+            reverse_feature = feature
+        style_output = feature @ self.last_layer_style
+        style_output = nn.functional.normalize(style_output, dim=1, p=2)
+        # if alpha is not None:
+        if self.content_proj_head == "custom":
+            content_output = self.last_layer_content(reverse_feature)
+        else:
+            content_output = reverse_feature @ self.last_layer_content
+        content_output = nn.functional.normalize(content_output, dim=1, p=2)
+        return feature, content_output, style_output

ip_adapter_artist/utils/ip_adapter.py ADDED Viewed

	@@ -0,0 +1,72 @@

+from diffusers.models.attention_processor import IPAdapterAttnProcessor2_0, Attention
+from diffusers.models.embeddings import (
+    ImageProjection,
+    MultiIPAdapterImageProjection,
+    IPAdapterPlusImageProjection,
+)
+import torch
+def save_ip_adapter(unet, path):
+    state_dict = {}
+    if (
+        hasattr(unet, "encoder_hid_proj")
+        and unet.encoder_hid_proj is not None
+        and isinstance(unet.encoder_hid_proj, torch.nn.Module)
+    ):
+        state_dict["encoder_hid_proj"] = unet.encoder_hid_proj.state_dict()
+    for name, module in unet.attn_processors.items():
+        if isinstance(module, torch.nn.Module):
+            state_dict[name] = module.state_dict()
+    torch.save(state_dict, path)
+def load_ip_adapter(
+    unet,
+    path,
+):
+    state_dict = torch.load(path, map_location="cpu")
+    if "encoder_hid_proj" in state_dict.keys():
+        num_image_text_embeds = 4
+        clip_embeddings_dim = state_dict["encoder_hid_proj"][
+            "image_projection_layers.0.image_embeds.weight"
+        ].shape[-1]
+        cross_attention_dim = (
+            state_dict["encoder_hid_proj"][
+                "image_projection_layers.0.image_embeds.weight"
+            ].shape[0]
+            // 4
+        )
+        if not hasattr(unet, "encoder_hid_proj") or unet.encoder_hid_proj is None:
+            unet.encoder_hid_proj = MultiIPAdapterImageProjection(
+                [
+                    ImageProjection(
+                        cross_attention_dim=cross_attention_dim,
+                        image_embed_dim=clip_embeddings_dim,
+                        num_image_text_embeds=num_image_text_embeds,
+                    )
+                ]
+            ).to(unet.device, unet.dtype)
+        unet.encoder_hid_proj.load_state_dict(state_dict["encoder_hid_proj"])
+    else:
+        unet.encoder_hid_proj = lambda x: x
+        cross_attention_dim = state_dict[
+            "down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor"
+        ]["to_k_ip.0.weight"].shape[-1]
+    unet.config.encoder_hid_dim_type = "ip_image_proj"
+    for name, module in unet.named_modules():
+        if "attn2" in name and isinstance(module, Attention):
+            if not isinstance(module.processor, IPAdapterAttnProcessor2_0):
+                module.set_processor(
+                    IPAdapterAttnProcessor2_0(
+                        hidden_size=module.query_dim,
+                        cross_attention_dim=cross_attention_dim,
+                    ).to(unet.device, unet.dtype)
+                )
+            module.processor.load_state_dict(
+                state_dict[f"{name}.processor"], strict=False
+            )

ip_adapter_artist_sdxl_demo.ipynb ADDED Viewed

	@@ -0,0 +1,210 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ip_adapter_artist.utils.csd_clip import CSD_CLIP\n",
+    "from ip_adapter_artist.utils.ip_adapter import (\n",
+    "    load_ip_adapter,\n",
+    ")\n",
+    "import torch\n",
+    "from transformers import CLIPImageProcessor\n",
+    "from PIL import Image\n",
+    "from diffusers.utils import make_image_grid,load_image\n",
+    "from huggingface_hub import hf_hub_download\n",
+    "from diffusers import StableDiffusionXLPipeline"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Download Models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "csd_clip_path = hf_hub_download(\n",
+    "    repo_id=\"AisingioroHao0/IP-Adapter-Artist\", filename=\"csd_clip.pth\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ip_adapter_artist_path = hf_hub_download(\n",
+    "    repo_id=\"AisingioroHao0/IP-Adapter-Artist\", filename=\"ip_adapter_artist_sdxl_512.pth\"\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "csd_clip = torch.load(csd_clip_path).to(\"cuda\")\n",
+    "csd_clip.requires_grad_(False)\n",
+    "csd_clip = csd_clip.eval()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipe = StableDiffusionXLPipeline.from_pretrained(\n",
+    "    \"stabilityai/stable-diffusion-xl-base-1.0\",\n",
+    "    variant=\"fp16\",\n",
+    "    torch_dtype=torch.float16,\n",
+    ").to(\"cuda\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "image_processor = CLIPImageProcessor()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "load_ip_adapter(\n",
+    "    pipe.unet,\n",
+    "    ip_adapter_artist_path,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scale = {\"up\": {\"block_0\": [0.0, 1.0, 0.0]}}\n",
+    "pipe.set_ip_adapter_scale(scale)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Process Style Image"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "image = load_image('https://github.com/aihao2000/IP-Adapter-Artist/blob/main/README.assets/example.jpg?raw=true')\n",
+    "image"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pixel_values = image_processor.preprocess(image, return_tensors=\"pt\").pixel_values\n",
+    "_, __, style_embeds = csd_clip(pixel_values.to(\"cuda\", torch.float32))\n",
+    "ip_adapter_image_embeds = torch.stack(\n",
+    "    [torch.zeros_like(style_embeds).to(\"cuda\"), style_embeds]\n",
+    ").to(\"cuda\", torch.float16)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Infer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result = pipe(\n",
+    "    ip_adapter_image_embeds=[ip_adapter_image_embeds],\n",
+    "    prompt=\"A cat sitting on a table, top hat, best quality, masterpiece\",\n",
+    "    negative_prompt=\"worst quality, low quality, low res, blurry, cropped image, jpeg artifacts, error, ugly, out of frame, deformed, poorly drawn\",\n",
+    "    generator=torch.Generator(\"cuda\").manual_seed(42),\n",
+    "    num_inference_steps=30,\n",
+    "    guidance_scale=5.0,\n",
+    ").images[0]\n",
+    "result"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result = pipe(\n",
+    "    ip_adapter_image_embeds=[ip_adapter_image_embeds],\n",
+    "    prompt=\"A house covered with ice and snow.\",\n",
+    "    negativ_prompt=\"multi view, worst quality, low quality, low res, blurry, cropped image, jpeg artifacts, error, ugly, out of frame, deformed, poorly drawn\",\n",
+    "    generator=torch.Generator(\"cuda\").manual_seed(42),\n",
+    "    num_inference_steps=30,\n",
+    "    guidance_scale=5.0,\n",
+    ").images[0]\n",
+    "result"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "torch",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

setup.py ADDED Viewed

	@@ -0,0 +1,31 @@

+from setuptools import find_packages, setup
+setup(
+    name="ip_adapter_artist",
+    version="0.1",
+    description="Using reference images to control style in diffusion models",
+    long_description=open("README.md", "r", encoding="utf-8").read(),
+    long_description_content_type="text/markdown",
+    keywords="Using reference images to control style in diffusion models",
+    license="Apache",
+    author="aihao",
+    author_email="aihao2000@outlook.com",
+    url="https://github.com/aihao2000/IP-Adapter-Artist",
+    packages=find_packages(),
+    python_requires=">=3.8.0",
+    install_requires=[
+        "diffusers",
+        "transformers",
+    ],
+    classifiers=[
+        "Development Status :: 5 - Production/Stable",
+        "Intended Audience :: Developers",
+        "Intended Audience :: Education",
+        "Intended Audience :: Science/Research",
+        "License :: OSI Approved :: Apache Software License",
+        "Operating System :: OS Independent",
+        "Programming Language :: Python :: 3",
+        "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    ],
+)